All of lore.kernel.org
 help / color / mirror / Atom feed
* Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
@ 2021-01-17 18:58 Bradley Chapman
  2021-01-18  4:36 ` Chaitanya Kulkarni
  0 siblings, 1 reply; 17+ messages in thread
From: Bradley Chapman @ 2021-01-17 18:58 UTC (permalink / raw)
  To: linux-nvme

All,

I recently plugged a 256GB SPCC NVMe 1.3 drive into the secondary slot 
on my Asus X570-P motherboard, running a Ryzen 5 3600 CPU. After 
partitioning and formatting the drive, it is detected thusly by the 
5.9.15 and 5.10.6 kernels:

[    1.653074] nvme nvme1: pci function 0000:04:00.0
[    1.657181] nvme nvme1: missing or invalid SUBNQN field.
[    1.662294] nvme nvme1: allocated 64 MiB host memory buffer.
[    1.663105] nvme nvme1: 15/0/0 default/read/poll queues
[    1.665815]  nvme1n1: p1

However, any I/O to the drive (including mounting its filesystem) causes 
the following errors to appear in the dmesg. These errors occur with 
both the 5.9.15 kernel and the 5.10.6 kernel, and with X570-P BIOS 
version 1406 and version 3001. I have modified the BIOS settings to 
specify that a GEN 3 device is plugged into the M.2_2 slot instead of 
allowing the BIOS to auto-detect the drive.

[ 2745.659502] refcount_t: underflow; use-after-free.
[ 2745.659510] WARNING: CPU: 2 PID: 0 at lib/refcount.c:28 
refcount_warn_saturate+0xab/0xf0
[ 2745.659510] Modules linked in: rfcomm(E) cmac(E) bnep(E) 
binfmt_misc(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) uas(E) 
usb_storage(E) btusb(E) btrtl(E) crct10dif_pclmul(E) btbcm(E) 
crc32_pclmul(E) btintel(E) ghash_clmulni_intel(E) bluetooth(E) rfkill(E) 
aesni_intel(E) crypto_simd(E) cryptd(E) glue_helper(E) efi_pstore(E) 
jitterentropy_rng(E) drbg(E) ccp(E) ansi_cprng(E) ecdh_generic(E) ecc(E) 
acpi_cpufreq(E) nft_counter(E) efivarfs(E) crc32c_intel(E)
[ 2745.659527] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G            E 
5.10.6-BET #1
[ 2745.659528] Hardware name: System manufacturer System Product 
Name/PRIME X570-P, BIOS 3001 12/04/2020
[ 2745.659529] RIP: 0010:refcount_warn_saturate+0xab/0xf0
[ 2745.659530] Code: 05 af d2 72 01 01 e8 7a 06 87 00 0f 0b c3 80 3d 9d 
d2 72 01 00 75 90 48 c7 c7 78 60 44 af c6 05 8d d2 72 01 01 e8 5b 06 87 
00 <0f> 0b c3 80 3d 7c d2 72 01 00 0f 85 6d ff ff ff 48 c7 c7 d0 60 44
[ 2745.659531] RSP: 0018:ffffaf1880298f30 EFLAGS: 00010086
[ 2745.659532] RAX: 0000000000000000 RBX: ffff9873cf3bc300 RCX: 
0000000000000027
[ 2745.659533] RDX: 0000000000000027 RSI: ffff987acea92e80 RDI: 
ffff987acea92e88
[ 2745.659533] RBP: ffff9873d0e661f0 R08: 0000000000000000 R09: 
c0000000ffffdfff
[ 2745.659534] R10: ffffaf1880298d50 R11: ffffaf1880298d48 R12: 
0000000000000001
[ 2745.659534] R13: ffff9873d0f98580 R14: ffff9873cdf8ac00 R15: 
0000000000000000
[ 2745.659535] FS:  0000000000000000(0000) GS:ffff987acea80000(0000) 
knlGS:0000000000000000
[ 2745.659536] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2745.659536] CR2: 00005588763902c8 CR3: 0000000107248000 CR4: 
0000000000350ee0
[ 2745.659537] Call Trace:
[ 2745.659538]  <IRQ>
[ 2745.659541]  nvme_irq+0x104/0x190
[ 2745.659543]  __handle_irq_event_percpu+0x2e/0xd0
[ 2745.659545]  handle_irq_event_percpu+0x33/0x80
[ 2745.659545]  handle_irq_event+0x39/0x70
[ 2745.659547]  handle_edge_irq+0x7c/0x1a0
[ 2745.659549]  asm_call_irq_on_stack+0x12/0x20
[ 2745.659549]  </IRQ>
[ 2745.659551]  common_interrupt+0xd7/0x160
[ 2745.659552]  asm_common_interrupt+0x1e/0x40
[ 2745.659554] RIP: 0010:cpuidle_enter_state+0xd2/0x2e0
[ 2745.659555] Code: e8 93 22 6a ff 31 ff 49 89 c5 e8 29 2c 6a ff 45 84 
ff 74 12 9c 58 f6 c4 02 0f 85 c4 01 00 00 31 ff e8 a2 d8 6f ff fb 45 85 
f6 <0f> 88 c9 00 00 00 49 63 ce be 68 00 00 00 4c 2b 2c 24 48 89 ca 48
[ 2745.659556] RSP: 0018:ffffaf188014fe80 EFLAGS: 00000202
[ 2745.659557] RAX: ffff987acea9ce00 RBX: 0000000000000002 RCX: 
000000000000001f
[ 2745.659557] RDX: 0000027f460f1f90 RSI: 00000000239f5229 RDI: 
0000000000000000
[ 2745.659558] RBP: ffff9873c1a4e800 R08: 0000000000000002 R09: 
000000000001c600
[ 2745.659558] R10: 0000090da145abf0 R11: ffff987acea9be24 R12: 
ffffffffaf6d38e0
[ 2745.659559] R13: 0000027f460f1f90 R14: 0000000000000002 R15: 
0000000000000000
[ 2745.659561]  cpuidle_enter+0x30/0x50
[ 2745.659562]  do_idle+0x24f/0x290
[ 2745.659564]  cpu_startup_entry+0x1b/0x20
[ 2745.659566]  start_secondary+0x10b/0x150
[ 2745.659567]  secondary_startup_64_no_verify+0xb0/0xbb
[ 2745.659569] ---[ end trace be84281f034198f3 ]---
[ 2776.138874] nvme nvme1: I/O 414 QID 3 timeout, aborting
[ 2776.138886] nvme nvme1: I/O 415 QID 3 timeout, aborting
[ 2776.138891] nvme nvme1: I/O 416 QID 3 timeout, aborting
[ 2776.138895] nvme nvme1: I/O 417 QID 3 timeout, aborting
[ 2776.138912] nvme nvme1: Abort status: 0x0
[ 2776.138921] nvme nvme1: I/O 428 QID 3 timeout, aborting
[ 2776.138922] nvme nvme1: Abort status: 0x0
[ 2776.138925] nvme nvme1: Abort status: 0x0
[ 2776.138974] nvme nvme1: Abort status: 0x0
[ 2776.138977] nvme nvme1: Abort status: 0x0
[ 2806.346792] nvme nvme1: I/O 414 QID 3 timeout, reset controller
[ 2806.363566] nvme nvme1: 15/0/0 default/read/poll queues
[ 2836.554298] nvme nvme1: I/O 415 QID 3 timeout, disable controller
[ 2836.672064] blk_update_request: I/O error, dev nvme1n1, sector 16350 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672072] blk_update_request: I/O error, dev nvme1n1, sector 16093 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672074] blk_update_request: I/O error, dev nvme1n1, sector 15836 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672076] blk_update_request: I/O error, dev nvme1n1, sector 15579 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672078] blk_update_request: I/O error, dev nvme1n1, sector 15322 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672080] blk_update_request: I/O error, dev nvme1n1, sector 15065 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672082] blk_update_request: I/O error, dev nvme1n1, sector 14808 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672083] blk_update_request: I/O error, dev nvme1n1, sector 14551 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672085] blk_update_request: I/O error, dev nvme1n1, sector 14294 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672087] blk_update_request: I/O error, dev nvme1n1, sector 14037 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[ 2836.672121] nvme nvme1: failed to mark controller live state
[ 2836.672123] nvme nvme1: Removing after probe failure status: -19
[ 2836.689016] Aborting journal on device dm-0-8.
[ 2836.689024] Buffer I/O error on dev dm-0, logical block 25198592, 
lost sync page write
[ 2836.689027] JBD2: Error -5 detected when updating journal superblock 
for dm-0-8.

[ 2836.723821] percpu ref (hd_struct_free) <= 0 (-28) after switching to 
atomic
[ 2836.723828] WARNING: CPU: 8 PID: 0 at lib/percpu-refcount.c:196 
percpu_ref_switch_to_atomic_rcu+0x139/0x140
[ 2836.723828] Modules linked in: rfcomm(E) cmac(E) bnep(E) 
binfmt_misc(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) uas(E) 
usb_storage(E) btusb(E) btrtl(E) crct10dif_pclmul(E) btbcm(E) 
crc32_pclmul(E) btintel(E) ghash_clmulni_intel(E) bluetooth(E) rfkill(E) 
aesni_intel(E) crypto_simd(E) cryptd(E) glue_helper(E) efi_pstore(E) 
jitterentropy_rng(E) drbg(E) ccp(E) ansi_cprng(E) ecdh_generic(E) ecc(E) 
acpi_cpufreq(E) nft_counter(E) efivarfs(E) crc32c_intel(E)
[ 2836.723844] CPU: 8 PID: 0 Comm: swapper/8 Tainted: G        W   E 
5.10.6-BET #1
[ 2836.723845] Hardware name: System manufacturer System Product 
Name/PRIME X570-P, BIOS 3001 12/04/2020
[ 2836.723847] RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x139/0x140
[ 2836.723848] Code: 80 3d f9 f0 72 01 00 0f 85 52 ff ff ff 49 8b 54 24 
e0 49 8b 74 24 e8 48 c7 c7 88 5f 44 af c6 05 db f0 72 01 01 e8 ad 24 87 
00 <0f> 0b e9 2e ff ff ff 41 55 49 89 f5 41 54 55 48 89 fd 53 48 83 ec
[ 2836.723849] RSP: 0018:ffffaf18803a0f20 EFLAGS: 00010282
[ 2836.723850] RAX: 0000000000000000 RBX: 7fffffffffffffe3 RCX: 
0000000000000027
[ 2836.723850] RDX: 0000000000000027 RSI: ffff987acec12e80 RDI: 
ffff987acec12e88
[ 2836.723851] RBP: 0000369db0c0e3c8 R08: 0000000000000000 R09: 
c0000000ffffdfff
[ 2836.723851] R10: ffffaf18803a0d40 R11: ffffaf18803a0d38 R12: 
ffff9873c0bbbda0
[ 2836.723852] R13: ffffffffaf765f10 R14: 0000000000000202 R15: 
ffffffffaf6060c0
[ 2836.723853] FS:  0000000000000000(0000) GS:ffff987acec00000(0000) 
knlGS:0000000000000000
[ 2836.723853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2836.723854] CR2: 0000558899b414c0 CR3: 0000000101cd2000 CR4: 
0000000000350ee0
[ 2836.723854] Call Trace:
[ 2836.723855]  <IRQ>
[ 2836.723859]  rcu_core+0x196/0x420
[ 2836.723862]  __do_softirq+0xc9/0x214
[ 2836.723863]  asm_call_irq_on_stack+0x12/0x20
[ 2836.723864]  </IRQ>
[ 2836.723866]  do_softirq_own_stack+0x31/0x40
[ 2836.723867]  irq_exit_rcu+0x9a/0xa0
[ 2836.723869]  sysvec_apic_timer_interrupt+0x2c/0x80
[ 2836.723870]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 2836.723872] RIP: 0010:cpuidle_enter_state+0xd2/0x2e0
[ 2836.723873] Code: e8 93 22 6a ff 31 ff 49 89 c5 e8 29 2c 6a ff 45 84 
ff 74 12 9c 58 f6 c4 02 0f 85 c4 01 00 00 31 ff e8 a2 d8 6f ff fb 45 85 
f6 <0f> 88 c9 00 00 00 49 63 ce be 68 00 00 00 4c 2b 2c 24 48 89 ca 48
[ 2836.723874] RSP: 0018:ffffaf188017fe80 EFLAGS: 00000202
[ 2836.723874] RAX: ffff987acec1ce00 RBX: 0000000000000002 RCX: 
000000000000001f
[ 2836.723875] RDX: 0000029479ea3f98 RSI: 00000000239f5229 RDI: 
0000000000000000
[ 2836.723875] RBP: ffff9873c1a4ec00 R08: 0000000000000002 R09: 
000000000001c600
[ 2836.723876] R10: 00000959d0ea6498 R11: ffff987acec1be24 R12: 
ffffffffaf6d38e0
[ 2836.723876] R13: 0000029479ea3f98 R14: 0000000000000002 R15: 
0000000000000000
[ 2836.723878]  cpuidle_enter+0x30/0x50
[ 2836.723880]  do_idle+0x24f/0x290
[ 2836.723882]  cpu_startup_entry+0x1b/0x20
[ 2836.723884]  start_secondary+0x10b/0x150
[ 2836.723885]  secondary_startup_64_no_verify+0xb0/0xbb
[ 2836.723887] ---[ end trace be84281f034198f4 ]---

After these errors are generated, the device becomes inaccessible and 
unmounting its filesystem (which does not hang in D state) generates 
additional errors:

[ 2868.181018] Buffer I/O error on dev dm-0, logical block 0, lost sync 
page write
[ 2868.181022] EXT4-fs (dm-0): I/O error while writing superblock

After the filesystem is unmounted the device no longer appears in the 
output of lsblk(8) and its device node(s) disappear after the kernel 
removes the device. Prior to the I/O failures, the nvme error-log 
command returns no error entries for any of the 64 log entries present.

nvme fw-log and nvme smart-log return the following output:

Firmware Log for device:nvme1
afi  : 0x20

Smart Log for NVME device:nvme1 namespace-id:ffffffff
critical_warning                    : 0
temperature                         : 48 C
available_spare                     : 100%
available_spare_threshold           : 10%
percentage_used                     : 0%
data_units_read                     : 234
data_units_written                  : 2,149
host_read_commands                  : 4,202
host_write_commands                 : 421,917
controller_busy_time                : 0
power_cycles                        : 7
power_on_hours                      : 11
unsafe_shutdowns                    : 0
media_errors                        : 0
num_err_log_entries                 : 0
Warning Temperature Time            : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1                : 48 C
Thermal Management T1 Trans Count   : 0
Thermal Management T2 Trans Count   : 0
Thermal Management T1 Total Time    : 0
Thermal Management T2 Total Time    : 0

I've checked the kernel change logs and I know that the refcount_t error 
has been occurring in other kernel subsystems and was subsequently fixed 
in recent kernel point releases, so I will be trying to reproduce this 
error with the most recent 5.10 and 5.11-rc kernels.

Any suggestions on what else to try next?

Thanks!

Brad

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-17 18:58 Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free Bradley Chapman
@ 2021-01-18  4:36 ` Chaitanya Kulkarni
  2021-01-18 18:33   ` Bradley Chapman
  0 siblings, 1 reply; 17+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-18  4:36 UTC (permalink / raw)
  To: chapman6235, linux-nvme

On 1/17/21 11:05 AM, Bradley Chapman wrote:
> [ 2836.554298] nvme nvme1: I/O 415 QID 3 timeout, disable controller
> [ 2836.672064] blk_update_request: I/O error, dev nvme1n1, sector 16350 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672072] blk_update_request: I/O error, dev nvme1n1, sector 16093 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672074] blk_update_request: I/O error, dev nvme1n1, sector 15836 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672076] blk_update_request: I/O error, dev nvme1n1, sector 15579 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672078] blk_update_request: I/O error, dev nvme1n1, sector 15322 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672080] blk_update_request: I/O error, dev nvme1n1, sector 15065 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672082] blk_update_request: I/O error, dev nvme1n1, sector 14808 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672083] blk_update_request: I/O error, dev nvme1n1, sector 14551 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672085] blk_update_request: I/O error, dev nvme1n1, sector 14294 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672087] blk_update_request: I/O error, dev nvme1n1, sector 14037 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672121] nvme nvme1: failed to mark controller live state
> [ 2836.672123] nvme nvme1: Removing after probe failure status: -19
> [ 2836.689016] Aborting journal on device dm-0-8.
> [ 2836.689024] Buffer I/O error on dev dm-0, logical block 25198592, 
> lost sync page write
> [ 2836.689027] JBD2: Error -5 detected when updating journal superblock 
> for dm-0-8.
Without the knowledge of fs mount/format command I can only suspect that
super
block zeroing issued with write-zeroes request is translated into
REQ_OP_WRITE_ZEROES which controller is not able to process resulting in
the error. This analysis maybe wrong.

Can you please share following details :-

nvme id-ns /dev/nvme0n1 -H (we are interested in oncs part here)

Also for above device what is the value for the queue block write-zeroes

parameter that is present in the
/sys/block/<nvmeXnY>/queue/write_zeroes_max_bytes ?

You can also try blkdiscard -z 0 -l 1024 /dev/<nvmeXnY> to see if the
problem is with
write zeroes.

Also can you please also try the latest nvme tree branch nvme-5.11 ?


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-18  4:36 ` Chaitanya Kulkarni
@ 2021-01-18 18:33   ` Bradley Chapman
  2021-01-20  3:08     ` Chaitanya Kulkarni
  0 siblings, 1 reply; 17+ messages in thread
From: Bradley Chapman @ 2021-01-18 18:33 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-nvme

Good afternoon!

On 1/17/21 11:36 PM, Chaitanya Kulkarni wrote:
> On 1/17/21 11:05 AM, Bradley Chapman wrote:
>> [ 2836.554298] nvme nvme1: I/O 415 QID 3 timeout, disable controller
>> [ 2836.672064] blk_update_request: I/O error, dev nvme1n1, sector 16350
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672072] blk_update_request: I/O error, dev nvme1n1, sector 16093
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672074] blk_update_request: I/O error, dev nvme1n1, sector 15836
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672076] blk_update_request: I/O error, dev nvme1n1, sector 15579
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672078] blk_update_request: I/O error, dev nvme1n1, sector 15322
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672080] blk_update_request: I/O error, dev nvme1n1, sector 15065
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672082] blk_update_request: I/O error, dev nvme1n1, sector 14808
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672083] blk_update_request: I/O error, dev nvme1n1, sector 14551
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672085] blk_update_request: I/O error, dev nvme1n1, sector 14294
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672087] blk_update_request: I/O error, dev nvme1n1, sector 14037
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672121] nvme nvme1: failed to mark controller live state
>> [ 2836.672123] nvme nvme1: Removing after probe failure status: -19
>> [ 2836.689016] Aborting journal on device dm-0-8.
>> [ 2836.689024] Buffer I/O error on dev dm-0, logical block 25198592,
>> lost sync page write
>> [ 2836.689027] JBD2: Error -5 detected when updating journal superblock
>> for dm-0-8.
> Without the knowledge of fs mount/format command I can only suspect that
> super
> block zeroing issued with write-zeroes request is translated into
> REQ_OP_WRITE_ZEROES which controller is not able to process resulting in
> the error. This analysis maybe wrong.
> 
> Can you please share following details :-
> 
> nvme id-ns /dev/nvme0n1 -H (we are interested in oncs part here)

I ran the requested command against /dev/nvme1n1 (since /dev/nvme0n1 
works perfectly so far) and here is the result:

NVME Identify Namespace 1:
nsze    : 0x1dcf32b0
ncap    : 0x1dcf32b0
nuse    : 0x1dcf32b0
nsfeat  : 0
   [2:2] : 0     Deallocated or Unwritten Logical Block error Not Supported
   [1:1] : 0     Namespace uses AWUN, AWUPF, and ACWU
   [0:0] : 0     Thin Provisioning Not Supported

nlbaf   : 0
flbas   : 0
   [4:4] : 0     Metadata Transferred in Separate Contiguous Buffer
   [3:0] : 0     Current LBA Format Selected

mc      : 0
   [1:1] : 0     Metadata Pointer Not Supported
   [0:0] : 0     Metadata as Part of Extended Data LBA Not Supported

dpc     : 0
   [4:4] : 0     Protection Information Transferred as Last 8 Bytes of 
Metadata Not Supported
   [3:3] : 0     Protection Information Transferred as First 8 Bytes of 
Metadata Not Supported
   [2:2] : 0     Protection Information Type 3 Not Supported
   [1:1] : 0     Protection Information Type 2 Not Supported
   [0:0] : 0     Protection Information Type 1 Not Supported

dps     : 0
   [3:3] : 0     Protection Information is Transferred as Last 8 Bytes 
of Metadata
   [2:0] : 0     Protection Information Disabled

nmic    : 0
   [0:0] : 0     Namespace Multipath Not Capable

rescap  : 0
   [6:6] : 0     Exclusive Access - All Registrants Not Supported
   [5:5] : 0     Write Exclusive - All Registrants Not Supported
   [4:4] : 0     Exclusive Access - Registrants Only Not Supported
   [3:3] : 0     Write Exclusive - Registrants Only Not Supported
   [2:2] : 0     Exclusive Access Not Supported
   [1:1] : 0     Write Exclusive Not Supported
   [0:0] : 0     Persist Through Power Loss Not Supported

fpi     : 0x80
   [7:7] : 0x1   Format Progress Indicator Supported
   [6:0] : 0     Format Progress Indicator (Remaining 0%)

dlfeat  : 1
   [4:4] : 0     Guard Field of Deallocated Logical Blocks is set to 0xFFFF
   [3:3] : 0     Deallocate Bit in the Write Zeroes Command is Not Supported
   [2:0] : 0x1   Bytes Read From a Deallocated Logical Block and its 
Metadata are 0x00

nawun   : 0
nawupf  : 0
nacwu   : 0
nabsn   : 0
nabo    : 0
nabspf  : 0
noiob   : 0
nvmcap  : 0
nsattr  : 0
nvmsetid: 0
anagrpid: 0
endgid  : 0
nguid   : 00000000000000000000000000000000
eui64   : 0000000000000000
LBA Format  0 : Metadata Size: 0   bytes - Data Size: 512 bytes - 
Relative Performance: 0 Best (in use)

> 
> Also for above device what is the value for the queue block write-zeroes
> 
> parameter that is present in the
> /sys/block/<nvmeXnY>/queue/write_zeroes_max_bytes ?

$ cat /sys/block/nvme1n1/queue/write_zeroes_max_bytes
131584

> 
> You can also try blkdiscard -z 0 -l 1024 /dev/<nvmeXnY> to see if the
> problem is with
> write zeroes.

# blkdiscard -z -l 1024 /dev/nvme1n1
blkdiscard: /dev/nvme1n1: BLKZEROOUT ioctl failed: Device or resource busy

> 
> Also can you please also try the latest nvme tree branch nvme-5.11 ?
> 

Where do I get that code from? Is it already in the 5.11-rc tree or do I 
need to look somewhere else? I checked https://github.com/linux-nvme but 
I did not see it there.

Brad

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-18 18:33   ` Bradley Chapman
@ 2021-01-20  3:08     ` Chaitanya Kulkarni
  2021-01-21  2:33       ` Bradley Chapman
  0 siblings, 1 reply; 17+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-20  3:08 UTC (permalink / raw)
  To: chapman6235, linux-nvme

On 1/18/21 10:33 AM, Bradley Chapman wrote:
> Good afternoon!
>
> On 1/17/21 11:36 PM, Chaitanya Kulkarni wrote:
>> On 1/17/21 11:05 AM, Bradley Chapman wrote:
>>> [ 2836.554298] nvme nvme1: I/O 415 QID 3 timeout, disable controller
>>> [ 2836.672064] blk_update_request: I/O error, dev nvme1n1, sector 16350
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672072] blk_update_request: I/O error, dev nvme1n1, sector 16093
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672074] blk_update_request: I/O error, dev nvme1n1, sector 15836
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672076] blk_update_request: I/O error, dev nvme1n1, sector 15579
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672078] blk_update_request: I/O error, dev nvme1n1, sector 15322
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672080] blk_update_request: I/O error, dev nvme1n1, sector 15065
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672082] blk_update_request: I/O error, dev nvme1n1, sector 14808
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672083] blk_update_request: I/O error, dev nvme1n1, sector 14551
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672085] blk_update_request: I/O error, dev nvme1n1, sector 14294
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672087] blk_update_request: I/O error, dev nvme1n1, sector 14037
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672121] nvme nvme1: failed to mark controller live state
>>> [ 2836.672123] nvme nvme1: Removing after probe failure status: -19
>>> [ 2836.689016] Aborting journal on device dm-0-8.
>>> [ 2836.689024] Buffer I/O error on dev dm-0, logical block 25198592,
>>> lost sync page write
>>> [ 2836.689027] JBD2: Error -5 detected when updating journal superblock
>>> for dm-0-8.
>> Without the knowledge of fs mount/format command I can only suspect that
>> super
>> block zeroing issued with write-zeroes request is translated into
>> REQ_OP_WRITE_ZEROES which controller is not able to process resulting in
>> the error. This analysis maybe wrong.
>>
>> Can you please share following details :-
>>
>> nvme id-ns /dev/nvme0n1 -H (we are interested in oncs part here)
> I ran the requested command against /dev/nvme1n1 (since /dev/nvme0n1 
> works perfectly so far) and here is the result:
Sorry my bad it suppose to be nvme id-ctrl /dev/nvme0n1 -H
>> Also for above device what is the value for the queue block write-zeroes
>>
>> parameter that is present in the
>> /sys/block/<nvmeXnY>/queue/write_zeroes_max_bytes ?
> $ cat /sys/block/nvme1n1/queue/write_zeroes_max_bytes
> 131584
So write-zeroes is configured from the setup.
>> You can also try blkdiscard -z 0 -l 1024 /dev/<nvmeXnY> to see if the
>> problem is with
>> write zeroes.
> # blkdiscard -z -l 1024 /dev/nvme1n1
> blkdiscard: /dev/nvme1n1: BLKZEROOUT ioctl failed: Device or resource busy
This is exactly what I thought, we need to add a quirk for this model
and make sure
we don't set the write-zeroes support and make blk-lib emulate the
write-zeroes.
>> Also can you please also try the latest nvme tree branch nvme-5.11 ?
>>
> Where do I get that code from? Is it already in the 5.11-rc tree or do I 
> need to look somewhere else? I checked https://github.com/linux-nvme but 
> I did not see it there.
Here is the link :-git://git.infradead.org/nvme.git
Branch 5.12.
> Brad
>


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-20  3:08     ` Chaitanya Kulkarni
@ 2021-01-21  2:33       ` Bradley Chapman
  2021-01-21 12:45         ` Niklas Cassel
  0 siblings, 1 reply; 17+ messages in thread
From: Bradley Chapman @ 2021-01-21  2:33 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-nvme

Good evening!

On 1/19/21 10:08 PM, Chaitanya Kulkarni wrote:
> On 1/18/21 10:33 AM, Bradley Chapman wrote:
>> Good afternoon!
>>
>> On 1/17/21 11:36 PM, Chaitanya Kulkarni wrote:
>>> On 1/17/21 11:05 AM, Bradley Chapman wrote:
>>>> [ 2836.554298] nvme nvme1: I/O 415 QID 3 timeout, disable controller
>>>> [ 2836.672064] blk_update_request: I/O error, dev nvme1n1, sector 16350
>>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>>> [ 2836.672072] blk_update_request: I/O error, dev nvme1n1, sector 16093
>>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>>> [ 2836.672074] blk_update_request: I/O error, dev nvme1n1, sector 15836
>>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>>> [ 2836.672076] blk_update_request: I/O error, dev nvme1n1, sector 15579
>>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>>> [ 2836.672078] blk_update_request: I/O error, dev nvme1n1, sector 15322
>>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>>> [ 2836.672080] blk_update_request: I/O error, dev nvme1n1, sector 15065
>>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>>> [ 2836.672082] blk_update_request: I/O error, dev nvme1n1, sector 14808
>>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>>> [ 2836.672083] blk_update_request: I/O error, dev nvme1n1, sector 14551
>>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>>> [ 2836.672085] blk_update_request: I/O error, dev nvme1n1, sector 14294
>>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>>> [ 2836.672087] blk_update_request: I/O error, dev nvme1n1, sector 14037
>>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>>> [ 2836.672121] nvme nvme1: failed to mark controller live state
>>>> [ 2836.672123] nvme nvme1: Removing after probe failure status: -19
>>>> [ 2836.689016] Aborting journal on device dm-0-8.
>>>> [ 2836.689024] Buffer I/O error on dev dm-0, logical block 25198592,
>>>> lost sync page write
>>>> [ 2836.689027] JBD2: Error -5 detected when updating journal superblock
>>>> for dm-0-8.
>>> Without the knowledge of fs mount/format command I can only suspect that
>>> super
>>> block zeroing issued with write-zeroes request is translated into
>>> REQ_OP_WRITE_ZEROES which controller is not able to process resulting in
>>> the error. This analysis maybe wrong.
>>>
>>> Can you please share following details :-
>>>
>>> nvme id-ns /dev/nvme0n1 -H (we are interested in oncs part here)
>> I ran the requested command against /dev/nvme1n1 (since /dev/nvme0n1
>> works perfectly so far) and here is the result:
> Sorry my bad it suppose to be nvme id-ctrl /dev/nvme0n1 -H

$ nvme id-ctrl /dev/nvme1n1 -H

NVME Identify Controller:
vid       : 0x2263
ssvid     : 0x1d97
sn        : P2002287000000001296
mn        : SPCC M.2 PCIe SSD
fr        : V1.0
rab       : 6
ieee      : 000000
cmic      : 0
   [3:3] : 0     ANA not supported
   [2:2] : 0     PCI
   [1:1] : 0     Single Controller
   [0:0] : 0     Single Port

mdts      : 5
cntlid    : 1
ver       : 10300
rtd3r     : 249f0
rtd3e     : 13880
oaes      : 0x200
   [9:9] : 0x1   Firmware Activation Notices Supported
   [8:8] : 0     Namespace Attribute Changed Event Not Supported

ctratt    : 0
   [5:5] : 0     Predictable Latency Mode Not Supported
   [4:4] : 0     Endurance Groups Not Supported
   [3:3] : 0     Read Recovery Levels Not Supported
   [2:2] : 0     NVM Sets Not Supported
   [1:1] : 0     Non-Operational Power State Permissive Not Supported
   [0:0] : 0     128-bit Host Identifier Not Supported

rrls      : 0
oacs      : 0x7
   [8:8] : 0     Doorbell Buffer Config Not Supported
   [7:7] : 0     Virtualization Management Not Supported
   [6:6] : 0     NVMe-MI Send and Receive Not Supported
   [5:5] : 0     Directives Not Supported
   [4:4] : 0     Device Self-test Not Supported
   [3:3] : 0     NS Management and Attachment Not Supported
   [2:2] : 0x1   FW Commit and Download Supported
   [1:1] : 0x1   Format NVM Supported
   [0:0] : 0x1   Security Send and Receive Supported

acl       : 3
aerl      : 3
frmw      : 0x2
   [4:4] : 0     Firmware Activate Without Reset Not Supported
   [3:1] : 0x1   Number of Firmware Slots
   [0:0] : 0     Firmware Slot 1 Read/Write

lpa       : 0xa
   [3:3] : 0x1   Telemetry host/controller initiated log page Suporrted
   [2:2] : 0     Extended data for Get Log Page Not Supported
   [1:1] : 0x1   Command Effects Log Page Supported
   [0:0] : 0     SMART/Health Log Page per NS Not Supported

elpe      : 63
npss      : 0
avscc     : 0x1
   [0:0] : 0x1   Admin Vendor Specific Commands uses NVMe Format

apsta     : 0
   [0:0] : 0     Autonomous Power State Transitions Not Supported

wctemp    : 354
cctemp    : 363
mtfa      : 0
hmpre     : 16384
hmmin     : 16384
tnvmcap   : 0
unvmcap   : 0
rpmbs     : 0
  [31:24]: 0     Access Size
  [23:16]: 0     Total Size
   [5:3] : 0     Authentication Method
   [2:0] : 0     Number of RPMB Units

edstt     : 5
dsto      : 1
fwug      : 0
kas       : 0
hctma     : 0
   [0:0] : 0     Host Controlled Thermal Management Not Supported

mntmt     : 0
mxtmt     : 0
sanicap   : 0
   [2:2] : 0     Overwrite Sanitize Operation Not Supported
   [1:1] : 0     Block Erase Sanitize Operation Not Supported
   [0:0] : 0     Crypto Erase Sanitize Operation Not Supported

hmminds   : 0
hmmaxd    : 0
nsetidmax : 0
anatt     : 0
anacap    : 0
   [7:7] : 0     Non-zero group ID Not Supported
   [6:6] : 0     Group ID does not change
   [4:4] : 0     ANA Change state Not Supported
   [3:3] : 0     ANA Persistent Loss state Not Supported
   [2:2] : 0     ANA Inaccessible state Not Supported
   [1:1] : 0     ANA Non-optimized state Not Supported
   [0:0] : 0     ANA Optimized state Not Supported

anagrpmax : 0
nanagrpid : 0
sqes      : 0x66
   [7:4] : 0x6   Max SQ Entry Size (64)
   [3:0] : 0x6   Min SQ Entry Size (64)

cqes      : 0x44
   [7:4] : 0x4   Max CQ Entry Size (16)
   [3:0] : 0x4   Min CQ Entry Size (16)

maxcmd    : 0
nn        : 1
oncs      : 0x1d
   [6:6] : 0     Timestamp Not Supported
   [5:5] : 0     Reservations Not Supported
   [4:4] : 0x1   Save and Select Supported
   [3:3] : 0x1   Write Zeroes Supported
   [2:2] : 0x1   Data Set Management Supported
   [1:1] : 0     Write Uncorrectable Not Supported
   [0:0] : 0x1   Compare Supported

fuses     : 0
   [0:0] : 0     Fused Compare and Write Not Supported

fna       : 0x3
   [2:2] : 0     Crypto Erase Not Supported as part of Secure Erase
   [1:1] : 0x1   Crypto Erase Applies to All Namespace(s)
   [0:0] : 0x1   Format Applies to All Namespace(s)

vwc       : 0x5
   [7:3] : 0x2   Reserved
   [0:0] : 0x1   Volatile Write Cache Present

awun      : 0
awupf     : 0
nvscc     : 0
   [0:0] : 0     NVM Vendor Specific Commands uses Vendor Specific Format

nwpc      : 0
   [2:2] : 0     Permanent Write Protect Not Supported
   [1:1] : 0     Write Protect Until Power Supply Not Supported
   [0:0] : 0     No Write Protect and Write Protect Namespace Not Supported

acwu      : 0
sgls      : 0
  [1:0]  : 0     Scatter-Gather Lists Not Supported

mnan      : 0
subnqn    :
ioccsz    : 0
iorcsz    : 0
icdoff    : 0
ctrattr   : 0
   [0:0] : 0     Dynamic Controller Model

msdbd     : 0
ps    0 : mp:3.30W operational enlat:5 exlat:5 rrt:0 rrl:0
           rwt:0 rwl:0 idle_power:- active_power:-

>>> Also for above device what is the value for the queue block write-zeroes
>>>
>>> parameter that is present in the
>>> /sys/block/<nvmeXnY>/queue/write_zeroes_max_bytes ?
>> $ cat /sys/block/nvme1n1/queue/write_zeroes_max_bytes
>> 131584
> So write-zeroes is configured from the setup.
>>> You can also try blkdiscard -z 0 -l 1024 /dev/<nvmeXnY> to see if the
>>> problem is with
>>> write zeroes.
>> # blkdiscard -z -l 1024 /dev/nvme1n1
>> blkdiscard: /dev/nvme1n1: BLKZEROOUT ioctl failed: Device or resource busy
> This is exactly what I thought, we need to add a quirk for this model
> and make sure
> we don't set the write-zeroes support and make blk-lib emulate the
> write-zeroes.

I am ready to take patches for the NVMe driver to test this out - this 
device is not a boot device and I have no data on it that needs to be 
preserved.

>>> Also can you please also try the latest nvme tree branch nvme-5.11 ?
>>>
>> Where do I get that code from? Is it already in the 5.11-rc tree or do I
>> need to look somewhere else? I checked https://github.com/linux-nvme but
>> I did not see it there.
> Here is the link :-git://git.infradead.org/nvme.git
> Branch 5.12.

I tried fetching the entire repo but it was huge and would have taken a 
long time, so I tried to fetch a single branch instead and got this result:

$ git clone --branch 5.12 --single-branch git://git.infradead.org/nvme.git
Cloning into 'nvme'...
warning: Could not find remote branch 5.12 to clone.
fatal: Remote branch 5.12 not found in upstream origin

I haven't compiled any out-of-tree kernel code in a very long time - how 
easy is it to add this code to a kernel tree and compile it into the 
kernel once I've figured out how to get it?

Brad

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-21  2:33       ` Bradley Chapman
@ 2021-01-21 12:45         ` Niklas Cassel
  2021-01-22  2:32           ` Bradley Chapman
  2021-01-22  2:54           ` Bradley Chapman
  0 siblings, 2 replies; 17+ messages in thread
From: Niklas Cassel @ 2021-01-21 12:45 UTC (permalink / raw)
  To: Bradley Chapman; +Cc: linux-nvme, Chaitanya Kulkarni

On Wed, Jan 20, 2021 at 09:33:08PM -0500, Bradley Chapman wrote:
> > > > Also can you please also try the latest nvme tree branch nvme-5.11 ?
> > > > 
> > > Where do I get that code from? Is it already in the 5.11-rc tree or do I
> > > need to look somewhere else? I checked https://github.com/linux-nvme but
> > > I did not see it there.
> > Here is the link :-git://git.infradead.org/nvme.git
> > Branch 5.12.
> 
> I tried fetching the entire repo but it was huge and would have taken a long
> time, so I tried to fetch a single branch instead and got this result:
> 
> $ git clone --branch 5.12 --single-branch git://git.infradead.org/nvme.git
> Cloning into 'nvme'...
> warning: Could not find remote branch 5.12 to clone.
> fatal: Remote branch 5.12 not found in upstream origin
> 
> I haven't compiled any out-of-tree kernel code in a very long time - how
> easy is it to add this code to a kernel tree and compile it into the kernel
> once I've figured out how to get it?

Hello there,

You can see the available branches by replacing git:// with https:// i.e.:
https://git.infradead.org/nvme.git

The branch is called nvme-5.12

It is not out-of-tree kernel code, it is a subsystem git tree,
so you build the kernel like usual.

If you already have a kernel git tree somewhere,
simply add an additional remote, and it should be quick:

$ git remote add nvme git://git.infradead.org/nvme.git && git fetch nvme


Kind regards,
Niklas
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-21 12:45         ` Niklas Cassel
@ 2021-01-22  2:32           ` Bradley Chapman
  2021-01-22  2:54             ` Chaitanya Kulkarni
  2021-01-22  2:54             ` Chaitanya Kulkarni
  2021-01-22  2:54           ` Bradley Chapman
  1 sibling, 2 replies; 17+ messages in thread
From: Bradley Chapman @ 2021-01-22  2:32 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: linux-nvme, Chaitanya Kulkarni

Good evening,

On 1/21/21 7:45 AM, Niklas Cassel wrote:
> On Wed, Jan 20, 2021 at 09:33:08PM -0500, Bradley Chapman wrote:
>>>>> Also can you please also try the latest nvme tree branch nvme-5.11 ?
>>>>>
>>>> Where do I get that code from? Is it already in the 5.11-rc tree or do I
>>>> need to look somewhere else? I checked https://github.com/linux-nvme but
>>>> I did not see it there.
>>> Here is the link :-git://git.infradead.org/nvme.git
>>> Branch 5.12.
>>
>> I tried fetching the entire repo but it was huge and would have taken a long
>> time, so I tried to fetch a single branch instead and got this result:
>>
>> $ git clone --branch 5.12 --single-branch git://git.infradead.org/nvme.git
>> Cloning into 'nvme'...
>> warning: Could not find remote branch 5.12 to clone.
>> fatal: Remote branch 5.12 not found in upstream origin
>>
>> I haven't compiled any out-of-tree kernel code in a very long time - how
>> easy is it to add this code to a kernel tree and compile it into the kernel
>> once I've figured out how to get it?
> 
> Hello there,
> 
> You can see the available branches by replacing git:// with https:// i.e.:
> https://git.infradead.org/nvme.git
> 
> The branch is called nvme-5.12
> 
> It is not out-of-tree kernel code, it is a subsystem git tree,
> so you build the kernel like usual.
> 
> If you already have a kernel git tree somewhere,
> simply add an additional remote, and it should be quick:
> 
> $ git remote add nvme git://git.infradead.org/nvme.git && git fetch nvme

Thanks for the pointer. I've downloaded the code and will add it to a 
stable 5.10 tree and a 5.11-rc tree and see what happens.

> 
> 
> Kind regards,
> Niklas
> 

Brad

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-22  2:32           ` Bradley Chapman
@ 2021-01-22  2:54             ` Chaitanya Kulkarni
  2021-01-22  2:54             ` Chaitanya Kulkarni
  1 sibling, 0 replies; 17+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-22  2:54 UTC (permalink / raw)
  To: chapman6235; +Cc: linux-nvme

On 1/21/21 6:32 PM, Bradley Chapman wrote:
> Thanks for the pointer. I've downloaded the code and will add it to a 
> stable 5.10 tree and a 5.11-rc tree and see what happens.
>
Please use the latest 5.12 branch and boot into the kernel.
If you can provide device's vendor ID and device ID I can cook up the
patch for you based on 5.12, will be waiting for your response.

These IDs can be found :-
cat  /sys/bus/pci/devices/<your device id>/device
cat  /sys/bus/pci/devices/<your device id>/vendor

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-22  2:32           ` Bradley Chapman
  2021-01-22  2:54             ` Chaitanya Kulkarni
@ 2021-01-22  2:54             ` Chaitanya Kulkarni
  1 sibling, 0 replies; 17+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-22  2:54 UTC (permalink / raw)
  To: chapman6235; +Cc: linux-nvme

On 1/21/21 6:32 PM, Bradley Chapman wrote:
> Thanks for the pointer. I've downloaded the code and will add it to a 
> stable 5.10 tree and a 5.11-rc tree and see what happens.
>
Please use the latest 5.12 branch and boot into the kernel.
If you can provide device's vendor ID and device ID I can cook up the
patch for you based on 5.12, will be waiting for your response.

These IDs can be found :-
cat  /sys/bus/pci/devices/<your device id>/device
cat  /sys/bus/pci/devices/<your device id>/vendor


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-21 12:45         ` Niklas Cassel
  2021-01-22  2:32           ` Bradley Chapman
@ 2021-01-22  2:54           ` Bradley Chapman
  2021-01-22  2:57             ` Chaitanya Kulkarni
  1 sibling, 1 reply; 17+ messages in thread
From: Bradley Chapman @ 2021-01-22  2:54 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: linux-nvme, Chaitanya Kulkarni

Good evening!

On 1/21/21 7:45 AM, Niklas Cassel wrote:
> On Wed, Jan 20, 2021 at 09:33:08PM -0500, Bradley Chapman wrote:
>>>>> Also can you please also try the latest nvme tree branch nvme-5.11 ?
>>>>>
>>>> Where do I get that code from? Is it already in the 5.11-rc tree or do I
>>>> need to look somewhere else? I checked https://github.com/linux-nvme but
>>>> I did not see it there.
>>> Here is the link :-git://git.infradead.org/nvme.git
>>> Branch 5.12.
>>
>> I tried fetching the entire repo but it was huge and would have taken a long
>> time, so I tried to fetch a single branch instead and got this result:
>>
>> $ git clone --branch 5.12 --single-branch git://git.infradead.org/nvme.git
>> Cloning into 'nvme'...
>> warning: Could not find remote branch 5.12 to clone.
>> fatal: Remote branch 5.12 not found in upstream origin
>>
>> I haven't compiled any out-of-tree kernel code in a very long time - how
>> easy is it to add this code to a kernel tree and compile it into the kernel
>> once I've figured out how to get it?
> 
> Hello there,
> 
> You can see the available branches by replacing git:// with https:// i.e.:
> https://git.infradead.org/nvme.git
> 
> The branch is called nvme-5.12
> 
> It is not out-of-tree kernel code, it is a subsystem git tree,
> so you build the kernel like usual.
> 
> If you already have a kernel git tree somewhere,
> simply add an additional remote, and it should be quick:
> 
> $ git remote add nvme git://git.infradead.org/nvme.git && git fetch nvme
> 
> 
> Kind regards,
> Niklas
> 

I compiled the kernel from the above git tree, rebooted and attempted to 
mount the filesystem on the NVMe drive. This is what the kernel put into 
the dmesg when I attempted to list the contents of the filesystem root, 
create an inode for a zero-byte file and then unmount the filesystem.

Brad

<snip/>

[   52.795975] refcount_t: underflow; use-after-free.
[   52.795981] WARNING: CPU: 7 PID: 0 at lib/refcount.c:28 
refcount_warn_saturate+0xab/0xf0
[   52.795989] Modules linked in: rfcomm(E) cmac(E) bnep(E) 
binfmt_misc(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) btusb(E) 
btrtl(E) btbcm(E) btintel(E) intel_rapl_common(E) iosf_mbi(E) 
crct10dif_pclmul(E) crc32_pclmul(E) bluetooth(E) ghash_clmulni_intel(E) 
rfkill(E) jitterentropy_rng(E) aesni_intel(E) crypto_simd(E) 
efi_pstore(E) cryptd(E) glue_helper(E) drbg(E) ccp(E) ansi_cprng(E) 
ecdh_generic(E) ecc(E) acpi_cpufreq(E) nft_counter(E) efivarfs(E) 
crc32c_intel(E)
[   52.796018] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G            E 
  5.11.0-rc1-BET+ #1
[   52.796021] Hardware name: System manufacturer System Product 
Name/PRIME X570-P, BIOS 3001 12/04/2020
[   52.796023] RIP: 0010:refcount_warn_saturate+0xab/0xf0
[   52.796026] Code: 05 02 a0 72 01 01 e8 49 7d 8b 00 0f 0b c3 80 3d f0 
9f 72 01 00 75 90 48 c7 c7 88 4c c7 8a c6 05 e0 9f 72 01 01 e8 2a 7d 8b 
00 <0f> 0b c3 80 3d cf 9f 72 01 00 0f 85 6d ff ff ff 48 c7 c7 e0 4c c7
[   52.796028] RSP: 0018:ffffa95b80374f28 EFLAGS: 00010082
[   52.796031] RAX: 0000000000000000 RBX: ffff9ac74f014800 RCX: 
0000000000000027
[   52.796032] RDX: 0000000000000027 RSI: ffff9ace4ebd2ed0 RDI: 
ffff9ace4ebd2ed8
[   52.796034] RBP: ffff9ac753820080 R08: 0000000000000000 R09: 
c0000000ffffdfff
[   52.796035] R10: ffffa95b80374d48 R11: ffffa95b80374d40 R12: 
0000000000000001
[   52.796037] R13: ffff9ac7539e2100 R14: 0000000000000016 R15: 
0000000000000000
[   52.796038] FS:  0000000000000000(0000) GS:ffff9ace4ebc0000(0000) 
knlGS:0000000000000000
[   52.796040] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   52.796042] CR2: 00007f3eb6493000 CR3: 00000006afe12000 CR4: 
0000000000350ee0
[   52.796043] Call Trace:
[   52.796045]  <IRQ>
[   52.796046]  nvme_irq+0x10b/0x190
[   52.796052]  __handle_irq_event_percpu+0x2e/0xd0
[   52.796056]  handle_irq_event_percpu+0x33/0x80
[   52.796058]  handle_irq_event+0x39/0x70
[   52.796060]  handle_edge_irq+0x7c/0x1a0
[   52.796064]  asm_call_irq_on_stack+0x12/0x20
[   52.796068]  </IRQ>
[   52.796069]  common_interrupt+0xd7/0x160
[   52.796073]  asm_common_interrupt+0x1e/0x40
[   52.796076] RIP: 0010:cpuidle_enter_state+0xd2/0x2e0
[   52.796080] Code: e8 73 ca 65 ff 31 ff 49 89 c5 e8 09 d4 65 ff 45 84 
ff 74 12 9c 58 f6 c4 02 0f 85 c4 01 00 00 31 ff e8 d2 8a 6b ff fb 45 85 
f6 <0f> 88 c9 00 00 00 49 63 ce be 68 00 00 00 4c 2b 2c 24 48 89 ca 48
[   52.796082] RSP: 0018:ffffa95b80177e80 EFLAGS: 00000202
[   52.796084] RAX: ffff9ace4ebdce80 RBX: 0000000000000002 RCX: 
000000000000001f
[   52.796085] RDX: 0000000c4ae2908c RSI: 00000000239f5229 RDI: 
0000000000000000
[   52.796086] RBP: ffff9ac74e561400 R08: 0000000000000002 R09: 
000000000001c680
[   52.796088] R10: 0000003ae7504a4c R11: ffff9ace4ebdbe64 R12: 
ffffffff8aed3d20
[   52.796089] R13: 0000000c4ae2908c R14: 0000000000000002 R15: 
0000000000000000
[   52.796092]  cpuidle_enter+0x30/0x50
[   52.796095]  do_idle+0x24f/0x290
[   52.796098]  cpu_startup_entry+0x1b/0x20
[   52.796100]  start_secondary+0x11b/0x160
[   52.796103]  secondary_startup_64_no_verify+0xb0/0xbb
[   52.796107] ---[ end trace a0a237d707896b40 ]---
[   82.811599] nvme nvme1: I/O 7 QID 8 timeout, aborting
[   82.811613] nvme nvme1: I/O 8 QID 8 timeout, aborting
[   82.811617] nvme nvme1: I/O 9 QID 8 timeout, aborting
[   82.811622] nvme nvme1: I/O 10 QID 8 timeout, aborting
[   82.811650] nvme nvme1: Abort status: 0x0
[   82.811665] nvme nvme1: Abort status: 0x0
[   82.811668] nvme nvme1: Abort status: 0x0
[   82.811670] nvme nvme1: Abort status: 0x0
[  113.019489] nvme nvme1: I/O 7 QID 8 timeout, reset controller
[  113.037771] nvme nvme1: 15/0/0 default/read/poll queues
[  143.228062] nvme nvme1: I/O 8 QID 8 timeout, disable controller
[  143.346027] blk_update_request: I/O error, dev nvme1n1, sector 16350 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  143.346039] blk_update_request: I/O error, dev nvme1n1, sector 16093 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  143.346044] blk_update_request: I/O error, dev nvme1n1, sector 15836 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  143.346047] blk_update_request: I/O error, dev nvme1n1, sector 15579 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  143.346049] blk_update_request: I/O error, dev nvme1n1, sector 15322 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  143.346052] blk_update_request: I/O error, dev nvme1n1, sector 15065 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  143.346055] blk_update_request: I/O error, dev nvme1n1, sector 14808 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  143.346057] blk_update_request: I/O error, dev nvme1n1, sector 14551 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  143.346060] blk_update_request: I/O error, dev nvme1n1, sector 14294 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  143.346063] blk_update_request: I/O error, dev nvme1n1, sector 14037 
op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  143.346116] nvme nvme1: failed to mark controller live state
[  143.346120] nvme nvme1: Removing after probe failure status: -19
[  143.351776] nvme1n1: detected capacity change from 0 to 500118192
[  143.351836] Aborting journal on device dm-0-8.
[  143.351842] Buffer I/O error on dev dm-0, logical block 25198592, 
lost sync page write
[  143.351846] JBD2: Error -5 detected when updating journal superblock 
for dm-0-8.
[  181.098750] EXT4-fs error (device dm-0): ext4_read_inode_bitmap:203: 
comm touch: Cannot read inode bitmap - block_group = 0, inode_bitmap = 1065
[  181.098792] Buffer I/O error on dev dm-0, logical block 0, lost sync 
page write
[  181.098800] EXT4-fs (dm-0): I/O error while writing superblock
[  181.098806] EXT4-fs error (device dm-0): ext4_journal_check_start:83: 
comm touch: Detected aborted journal
[  181.098811] Buffer I/O error on dev dm-0, logical block 0, lost sync 
page write
[  181.098817] EXT4-fs (dm-0): I/O error while writing superblock
[  181.098819] EXT4-fs (dm-0): Remounting filesystem read-only

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-22  2:54           ` Bradley Chapman
@ 2021-01-22  2:57             ` Chaitanya Kulkarni
  2021-01-22  3:16               ` Chaitanya Kulkarni
  0 siblings, 1 reply; 17+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-22  2:57 UTC (permalink / raw)
  To: chapman6235; +Cc: linux-nvme

Bradley,

On 1/21/21 6:54 PM, Bradley Chapman wrote:
> I compiled the kernel from the above git tree, rebooted and attempted to 
> mount the filesystem on the NVMe drive. This is what the kernel put into 
> the dmesg when I attempted to list the contents of the filesystem root, 
> create an inode for a zero-byte file and then unmount the filesystem.
>
> Brad
Did you get a chance to see my response to your previous email ?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-22  2:57             ` Chaitanya Kulkarni
@ 2021-01-22  3:16               ` Chaitanya Kulkarni
  2021-01-23  0:54                 ` Bradley Chapman
  0 siblings, 1 reply; 17+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-22  3:16 UTC (permalink / raw)
  To: chapman6235; +Cc: linux-nvme

On 1/21/21 6:57 PM, Chaitanya Kulkarni wrote:
> Bradley,
>
> On 1/21/21 6:54 PM, Bradley Chapman wrote:
>> I compiled the kernel from the above git tree, rebooted and attempted to 
>> mount the filesystem on the NVMe drive. This is what the kernel put into 
>> the dmesg when I attempted to list the contents of the filesystem root, 
>> create an inode for a zero-byte file and then unmount the filesystem.
>>
>> Brad
> Did you get a chance to see my response to your previous email ?
>
You can try following patch with some modification :-

From e162a2e91e4895ceac6f80042a87c4ba6a4fbbf5 Mon Sep 17 00:00:00 2001
From: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Date: Thu, 21 Jan 2021 19:05:13 -0800
Subject: [PATCH] nvme-pci: add device quirk wip

This is work in progress patch which is based on nvme-5.12
HEAD : b116d37fc0f5 nvmet: add lba to sect conversion helpers

Replace <YOUR DEVICE'S VENDOR ID> and <YOUR DEVICE's DEVICE ID> with
actual values sysfs entries in patch below before you apply the patch :-

cat  /sys/bus/pci/devices/<your device id>/device
cat  /sys/bus/pci/devices/<your device id>/vendor

This patch is not tested at all.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/host/pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 25456d02eddb..c5b43bcf57b0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3228,6 +3228,8 @@ static const struct pci_device_id nvme_id_table[] = {
         .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
     { PCI_DEVICE(0x15b7, 0x2001),   /*  Sandisk Skyhawk */
         .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
+    { PCI_DEVICE(<YOUR DEVICE's VENDOR ID>, <YOUR DEVICE's DEVICE ID>),
+        .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
     { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001),
         .driver_data = NVME_QUIRK_SINGLE_VECTOR },
     { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },
-- 
2.22.1




_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-22  3:16               ` Chaitanya Kulkarni
@ 2021-01-23  0:54                 ` Bradley Chapman
  2021-01-25  8:16                   ` Niklas Cassel
  0 siblings, 1 reply; 17+ messages in thread
From: Bradley Chapman @ 2021-01-23  0:54 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-nvme

Hello sir!

I didn't check my e-mail until this evening, so I saw all four of your 
e-mails at once. I ran the commands you specified based on the following 
information from dmesg and lspci:

dmesg:
[    1.633908] nvme nvme1: pci function 0000:04:00.0

lspci:
04:00.0 Non-Volatile memory controller: Device 1d97:2263 (rev 03)

$ cat /sys/bus/pci/devices/0000\:04\:00.0/device
0x2263

$ cat /sys/bus/pci/devices/0000\:04\:00.0/vendor
0x1d97

On 1/21/21 10:16 PM, Chaitanya Kulkarni wrote:
> On 1/21/21 6:57 PM, Chaitanya Kulkarni wrote:
>> Bradley,
>>
>> On 1/21/21 6:54 PM, Bradley Chapman wrote:
>>> I compiled the kernel from the above git tree, rebooted and attempted to
>>> mount the filesystem on the NVMe drive. This is what the kernel put into
>>> the dmesg when I attempted to list the contents of the filesystem root,
>>> create an inode for a zero-byte file and then unmount the filesystem.
>>>
>>> Brad
>> Did you get a chance to see my response to your previous email ?
>>
> You can try following patch with some modification :-
> 
>>From e162a2e91e4895ceac6f80042a87c4ba6a4fbbf5 Mon Sep 17 00:00:00 2001
> From: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> Date: Thu, 21 Jan 2021 19:05:13 -0800
> Subject: [PATCH] nvme-pci: add device quirk wip
> 
> This is work in progress patch which is based on nvme-5.12
> HEAD : b116d37fc0f5 nvmet: add lba to sect conversion helpers
> 
> Replace <YOUR DEVICE'S VENDOR ID> and <YOUR DEVICE's DEVICE ID> with
> actual values sysfs entries in patch below before you apply the patch :-
> 
> cat  /sys/bus/pci/devices/<your device id>/device
> cat  /sys/bus/pci/devices/<your device id>/vendor
> 
> This patch is not tested at all.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>   drivers/nvme/host/pci.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 25456d02eddb..c5b43bcf57b0 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -3228,6 +3228,8 @@ static const struct pci_device_id nvme_id_table[] = {
>           .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
>       { PCI_DEVICE(0x15b7, 0x2001),   /*  Sandisk Skyhawk */
>           .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
> +    { PCI_DEVICE(<YOUR DEVICE's VENDOR ID>, <YOUR DEVICE's DEVICE ID>),
> +        .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
>       { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001),
>           .driver_data = NVME_QUIRK_SINGLE_VECTOR },
>       { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },
> 

With the following patch applied to the NVMe tree, my system hard-locked 
and would not respond to Alt+SysRQ once I mounted the filesystem and 
attempted a directory listing of the root of the filesystem.

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 25456d02eddb..7ba5e8e92e19 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3228,6 +3228,8 @@ static const struct pci_device_id nvme_id_table[] = {
                 .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
         { PCI_DEVICE(0x15b7, 0x2001),   /*  Sandisk Skyhawk */
                 .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
+       { PCI_DEVICE(0x1d97, 0x2263),   /*  SPCC */
+               .driver_data = NVME_QUIRK_SINGLE_VECTOR },
         { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001),
                 .driver_data = NVME_QUIRK_SINGLE_VECTOR },
         { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },

I don't have a serial console, nor a serial port or other suitable 
cabling to make one, so I have no console logs of what caused the hard 
lockup, and the lack of response to Alt+SysRQ+S meant that I have no 
written logs to share with you all. I'm a bit leery of hard-locking the 
system multiple times to try to snipe the dmesg, since I don't want to 
trash the other filesystems on this host. What else can I try before I 
do that?

Brad

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-23  0:54                 ` Bradley Chapman
@ 2021-01-25  8:16                   ` Niklas Cassel
  2021-01-25  8:34                     ` Chaitanya Kulkarni
  0 siblings, 1 reply; 17+ messages in thread
From: Niklas Cassel @ 2021-01-25  8:16 UTC (permalink / raw)
  To: Bradley Chapman; +Cc: linux-nvme, Chaitanya Kulkarni

On Fri, Jan 22, 2021 at 07:54:26PM -0500, Bradley Chapman wrote:
> With the following patch applied to the NVMe tree, my system hard-locked and
> would not respond to Alt+SysRQ once I mounted the filesystem and attempted a
> directory listing of the root of the filesystem.
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 25456d02eddb..7ba5e8e92e19 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -3228,6 +3228,8 @@ static const struct pci_device_id nvme_id_table[] = {
>                 .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
>         { PCI_DEVICE(0x15b7, 0x2001),   /*  Sandisk Skyhawk */
>                 .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
> +       { PCI_DEVICE(0x1d97, 0x2263),   /*  SPCC */
> +               .driver_data = NVME_QUIRK_SINGLE_VECTOR },
>         { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001),
>                 .driver_data = NVME_QUIRK_SINGLE_VECTOR },
>         { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },
> 

Hello Bradley,

Chaitanya asked you to test the NVME_QUIRK_DISABLE_WRITE_ZEROES quirk.
Your patch seems to instead use the NVME_QUIRK_SINGLE_VECTOR quirk.

Did you try the NVME_QUIRK_DISABLE_WRITE_ZEROES quirk?


Kind regards,
Niklas
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-25  8:16                   ` Niklas Cassel
@ 2021-01-25  8:34                     ` Chaitanya Kulkarni
  2021-01-26  2:03                       ` Bradley Chapman
  0 siblings, 1 reply; 17+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-25  8:34 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: Bradley Chapman, linux-nvme

I have pointed that out on friday already offline to reduce the mailing list noise. 

> On Jan 25, 2021, at 12:16 AM, Niklas Cassel <Niklas.Cassel@wdc.com> wrote:
> 
> On Fri, Jan 22, 2021 at 07:54:26PM -0500, Bradley Chapman wrote:
>> With the following patch applied to the NVMe tree, my system hard-locked and
>> would not respond to Alt+SysRQ once I mounted the filesystem and attempted a
>> directory listing of the root of the filesystem.
>> 
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 25456d02eddb..7ba5e8e92e19 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -3228,6 +3228,8 @@ static const struct pci_device_id nvme_id_table[] = {
>>                .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
>>        { PCI_DEVICE(0x15b7, 0x2001),   /*  Sandisk Skyhawk */
>>                .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
>> +       { PCI_DEVICE(0x1d97, 0x2263),   /*  SPCC */
>> +               .driver_data = NVME_QUIRK_SINGLE_VECTOR },
>>        { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001),
>>                .driver_data = NVME_QUIRK_SINGLE_VECTOR },
>>        { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },
>> 
> 
> Hello Bradley,
> 
> Chaitanya asked you to test the NVME_QUIRK_DISABLE_WRITE_ZEROES quirk.
> Your patch seems to instead use the NVME_QUIRK_SINGLE_VECTOR quirk.
> 
> Did you try the NVME_QUIRK_DISABLE_WRITE_ZEROES quirk?
> 
> 
> Kind regards,
> Niklas
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-25  8:34                     ` Chaitanya Kulkarni
@ 2021-01-26  2:03                       ` Bradley Chapman
  2021-01-26  2:04                         ` Chaitanya Kulkarni
  0 siblings, 1 reply; 17+ messages in thread
From: Bradley Chapman @ 2021-01-26  2:03 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Niklas Cassel; +Cc: linux-nvme

Good evening!

On 1/25/21 3:34 AM, Chaitanya Kulkarni wrote:
> I have pointed that out on friday already offline to reduce the mailing list noise.
> 
>> On Jan 25, 2021, at 12:16 AM, Niklas Cassel <Niklas.Cassel@wdc.com> wrote:
>>
>> On Fri, Jan 22, 2021 at 07:54:26PM -0500, Bradley Chapman wrote:
>>> With the following patch applied to the NVMe tree, my system hard-locked and
>>> would not respond to Alt+SysRQ once I mounted the filesystem and attempted a
>>> directory listing of the root of the filesystem.
>>>
>>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>>> index 25456d02eddb..7ba5e8e92e19 100644
>>> --- a/drivers/nvme/host/pci.c
>>> +++ b/drivers/nvme/host/pci.c
>>> @@ -3228,6 +3228,8 @@ static const struct pci_device_id nvme_id_table[] = {
>>>                 .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
>>>         { PCI_DEVICE(0x15b7, 0x2001),   /*  Sandisk Skyhawk */
>>>                 .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
>>> +       { PCI_DEVICE(0x1d97, 0x2263),   /*  SPCC */
>>> +               .driver_data = NVME_QUIRK_SINGLE_VECTOR },
>>>         { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001),
>>>                 .driver_data = NVME_QUIRK_SINGLE_VECTOR },
>>>         { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },
>>>
>>
>> Hello Bradley,
>>
>> Chaitanya asked you to test the NVME_QUIRK_DISABLE_WRITE_ZEROES quirk.
>> Your patch seems to instead use the NVME_QUIRK_SINGLE_VECTOR quirk.
>>
>> Did you try the NVME_QUIRK_DISABLE_WRITE_ZEROES quirk?
>>
>>
>> Kind regards,
>> Niklas

As Chaitanya pointed out, I did in fact re-test with the correct patch 
and everything worked flawlessly. I have sent the corrected patches to 
Chaitanya directly.

Brad

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
  2021-01-26  2:03                       ` Bradley Chapman
@ 2021-01-26  2:04                         ` Chaitanya Kulkarni
  0 siblings, 0 replies; 17+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-26  2:04 UTC (permalink / raw)
  To: chapman6235, Niklas Cassel; +Cc: linux-nvme

On 1/25/21 18:03, Bradley Chapman wrote:
> As Chaitanya pointed out, I did in fact re-test with the correct patch 
> and everything worked flawlessly. I have sent the corrected patches to 
> Chaitanya directly.
>
> Brad
>
Thanks for confirming that, I'll send a patch with your tested by tag.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-01-26  2:16 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-17 18:58 Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free Bradley Chapman
2021-01-18  4:36 ` Chaitanya Kulkarni
2021-01-18 18:33   ` Bradley Chapman
2021-01-20  3:08     ` Chaitanya Kulkarni
2021-01-21  2:33       ` Bradley Chapman
2021-01-21 12:45         ` Niklas Cassel
2021-01-22  2:32           ` Bradley Chapman
2021-01-22  2:54             ` Chaitanya Kulkarni
2021-01-22  2:54             ` Chaitanya Kulkarni
2021-01-22  2:54           ` Bradley Chapman
2021-01-22  2:57             ` Chaitanya Kulkarni
2021-01-22  3:16               ` Chaitanya Kulkarni
2021-01-23  0:54                 ` Bradley Chapman
2021-01-25  8:16                   ` Niklas Cassel
2021-01-25  8:34                     ` Chaitanya Kulkarni
2021-01-26  2:03                       ` Bradley Chapman
2021-01-26  2:04                         ` Chaitanya Kulkarni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.