All of lore.kernel.org
 help / color / mirror / Atom feed
* nvme drive kernel 5.0 problem
@ 2019-05-16  0:20 Iván Chavero
  2019-05-16  3:57 ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Iván Chavero @ 2019-05-16  0:20 UTC (permalink / raw)


Hello,

I have a macbook air with a nvme drive and Fedora installed. 
Since kernel 4.16.11 it wouldn't boot properly, now with kernel 5.0
I started tampering with the nvme driver settings and managed to boot
but there's a problem with this settings, the driver shows this errors:

print_req_error: operation not supported error, dev nvme01...

And then the dm driver has problems writing to the device and the ext4
filesystem
remounts the root partition as read only which is no good.

The other partitions end up being mounted properly and are writable so
i have the hypotesis 
that there's a timing/initialization/queue problem while booting with
the driver that causes
problems writing to the device. 

I'm not very familiar with the driver so i'm not sure if the problem is
on my settings or
there's a problem with the driver in kernel 5.0.

This are the settings i'm using:
options $kernelopts nvme.poll_queues=2 nvme.io_queue_depth=2
nvme.max_host_mem_size_mb=128 nvme.
sgl_threshold=32768 nvme.use_cmb_sqes=Y
nvme_core.default_ps_max_latency_us=100000 nvme_core.for
ce_apst=Y nvme_core.io_timeout=30 nvme_core.max_retries=5
nvme_core.shutdown_timeout=5 nvme_core
.streams=N  systemd.unit=multi-user.target 

This are the relevant lines from the dmesg output:

[    2.330023] nvme nvme0: pci function 0000:04:00.0
[    2.332179] nvme nvme0: detected Apple NVMe controller, set queue
depth=2 to work around controller resets
[    2.335047] nvme nvme0: 1/0/0 default/read/poll queues
[    2.336944] WARNING: CPU: 3 PID: 46 at drivers/pci/msi.c:1278
pci_irq_get_affinity+0x66/0x90
[    2.338834] Modules linked in: nvme(+) drm ghash_clmulni_intel(+)
nvme_core video hid_apple fuse ecryptfs
[    2.340743] CPU: 3 PID: 46 Comm: kworker/u8:1 Not tainted 5.0.10-
300.fc30.x86_64 #1
[    2.342681] Hardware name: Apple Inc. MacBookAir7,1/Mac-
9F18E312C5C2BF0B, BIOS MBA71.88Z.0166.B09.1510261436 10/26/2015
[    2.344666] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[    2.346639] RIP: 0010:pci_irq_get_affinity+0x66/0x90
[    2.348616] Code: 0b 31 c0 c3 83 e2 02 48 c7 c0 a0 18 37 8f 74 30 48
8b 87 c0 02 00 00 48 85 c0 74 0e 48 8b 50 30 48 85 d2 74 05 39 70 14 77
05 <0f> 0b 31 c0 c3 48 63 f6 48 89 f0 48 c1 e0 04 48 01 c6 48 8d 04 f2
[    2.352395] usb 1-3.3: new full-speed USB device number 6 using
xhci_hcd
[    2.352706] RSP: 0000:ffffb83700e13cd8 EFLAGS: 00010246
[    2.356747] RAX: ffff9fbbde10d540 RBX: 0000000000000000 RCX:
0000000000000004
[    2.358758] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffff9fbbe4e28000
[    2.360793] RBP: 0000000000000000 R08: 0000000000000003 R09:
ffff9fbbe5342a90
[    2.362806] R10: ffff9fbbe6403b00 R11: 0000000000000000 R12:
ffff9fbbe4e28000
[    2.364793] R13: 0000000000000000 R14: ffff9fbbdd5f9008 R15:
0000000000000000
[    2.366720] FS:  0000000000000000(0000) GS:ffff9fbbe6b80000(0000)
knlGS:0000000000000000
[    2.368651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.370580] CR2: 000055a44841a000 CR3: 000000025e96c001 CR4:
00000000003606e0
[    2.372490] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    2.374359] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[    2.375888] Call Trace:
[    2.377577]  blk_mq_pci_map_queues+0x2d/0xb0
[    2.379473]  nvme_pci_map_queues+0x80/0xb0 [nvme]
[    2.381375]  blk_mq_alloc_tag_set+0x118/0x2c0
[    2.381380]  nvme_reset_work+0x13ca/0x1882 [nvme]
[    2.384415]  process_one_work+0x19d/0x380
[    2.384417]  worker_thread+0x1db/0x3b0
[    2.387711]  kthread+0xfb/0x130
[    2.387715]  ? process_one_work+0x380/0x380
[    2.391100]  ? kthread_park+0x90/0x90
[    2.391103]  ret_from_fork+0x35/0x40
[    2.393794] ---[ end trace a39c0e715c7be61d ]---
[    2.399294]  nvme0n1: p1 p2 p3 p4 p5 p6
[    2.472772] usb 1-3.3: New USB device found, idVendor=05ac,
idProduct=828f, bcdDevice= 1.12
[    2.474186] usb 1-3.3: New USB device strings: Mfr=1, Product=2,
SerialNumber=0
[    2.475734] usb 1-3.3: Product: Bluetooth USB Host Controller
[    2.477272] usb 1-3.3: Manufacturer: Apple Inc.

...

[   11.561741] print_req_error: operation not supported error, dev
nvme0n1, sector 437502136 flags 801
[   11.561760] EXT4-fs warning (device dm-1): ext4_end_bio:317: I/O
error 1 writing to inode 1974013 (offset 25288704 size 237568 starting
block 6516759)
[   11.561762] Buffer I/O error on device dm-1, logical block 6516759

...

[   11.561801] EXT4-fs (dm-1): Delayed block allocation failed for
inode 1974013 at logical offset 6234 with max blocks 2 with error 95
[   11.561804] EXT4-fs (dm-1): This should not happen!! Data will be
lost

...

[   11.649053] print_req_error: operation not supported error, dev
nvme0n1, sector 437502744 flags 801
[   11.649066] EXT4-fs warning (device dm-1): ext4_end_bio:317: I/O
error 1 writing to inode 1974013 (offset 25571328 size 266240 starting
block 6516835)

...

[   11.714748] print_req_error: operation not supported error, dev
nvme0n1, sector 435962536 flags 801
[   11.714908] Aborting journal on device dm-1-8.
[   11.715472] EXT4-fs error (device dm-1):
ext4_journal_check_start:61: Detected aborted journal 
[   11.715474] EXT4-fs (dm-1): Remounting filesystem read-only


Thanks for the help!
Iv?n

^ permalink raw reply	[flat|nested] 10+ messages in thread

* nvme drive kernel 5.0 problem
  2019-05-16  0:20 nvme drive kernel 5.0 problem Iván Chavero
@ 2019-05-16  3:57 ` Ming Lei
  2019-05-16  5:50   ` Iván Chavero
  2019-05-20 22:12   ` Iván Chavero
  0 siblings, 2 replies; 10+ messages in thread
From: Ming Lei @ 2019-05-16  3:57 UTC (permalink / raw)


On Thu, May 16, 2019@8:39 AM Iv?n Chavero <ichavero@chavero.com.mx> wrote:
>
> Hello,
>
> I have a macbook air with a nvme drive and Fedora installed.
> Since kernel 4.16.11 it wouldn't boot properly, now with kernel 5.0
> I started tampering with the nvme driver settings and managed to boot
> but there's a problem with this settings, the driver shows this errors:
>
> print_req_error: operation not supported error, dev nvme01...
>
> And then the dm driver has problems writing to the device and the ext4
> filesystem
> remounts the root partition as read only which is no good.
>
> The other partitions end up being mounted properly and are writable so
> i have the hypotesis
> that there's a timing/initialization/queue problem while booting with
> the driver that causes
> problems writing to the device.
>
> I'm not very familiar with the driver so i'm not sure if the problem is
> on my settings or
> there's a problem with the driver in kernel 5.0.
>
> This are the settings i'm using:
> options $kernelopts nvme.poll_queues=2 nvme.io_queue_depth=2
> nvme.max_host_mem_size_mb=128 nvme.
> sgl_threshold=32768 nvme.use_cmb_sqes=Y
> nvme_core.default_ps_max_latency_us=100000 nvme_core.for
> ce_apst=Y nvme_core.io_timeout=30 nvme_core.max_retries=5
> nvme_core.shutdown_timeout=5 nvme_core
> .streams=N  systemd.unit=multi-user.target
>
> This are the relevant lines from the dmesg output:
>
> [    2.330023] nvme nvme0: pci function 0000:04:00.0
> [    2.332179] nvme nvme0: detected Apple NVMe controller, set queue
> depth=2 to work around controller resets
> [    2.335047] nvme nvme0: 1/0/0 default/read/poll queues
> [    2.336944] WARNING: CPU: 3 PID: 46 at drivers/pci/msi.c:1278
> pci_irq_get_affinity+0x66/0x90
> [    2.338834] Modules linked in: nvme(+) drm ghash_clmulni_intel(+)
> nvme_core video hid_apple fuse ecryptfs
> [    2.340743] CPU: 3 PID: 46 Comm: kworker/u8:1 Not tainted 5.0.10-
> 300.fc30.x86_64 #1
> [    2.342681] Hardware name: Apple Inc. MacBookAir7,1/Mac-
> 9F18E312C5C2BF0B, BIOS MBA71.88Z.0166.B09.1510261436 10/26/2015
> [    2.344666] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
> [    2.346639] RIP: 0010:pci_irq_get_affinity+0x66/0x90
> [    2.348616] Code: 0b 31 c0 c3 83 e2 02 48 c7 c0 a0 18 37 8f 74 30 48
> 8b 87 c0 02 00 00 48 85 c0 74 0e 48 8b 50 30 48 85 d2 74 05 39 70 14 77
> 05 <0f> 0b 31 c0 c3 48 63 f6 48 89 f0 48 c1 e0 04 48 01 c6 48 8d 04 f2
> [    2.352395] usb 1-3.3: new full-speed USB device number 6 using
> xhci_hcd
> [    2.352706] RSP: 0000:ffffb83700e13cd8 EFLAGS: 00010246
> [    2.356747] RAX: ffff9fbbde10d540 RBX: 0000000000000000 RCX:
> 0000000000000004
> [    2.358758] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffff9fbbe4e28000
> [    2.360793] RBP: 0000000000000000 R08: 0000000000000003 R09:
> ffff9fbbe5342a90
> [    2.362806] R10: ffff9fbbe6403b00 R11: 0000000000000000 R12:
> ffff9fbbe4e28000
> [    2.364793] R13: 0000000000000000 R14: ffff9fbbdd5f9008 R15:
> 0000000000000000
> [    2.366720] FS:  0000000000000000(0000) GS:ffff9fbbe6b80000(0000)
> knlGS:0000000000000000
> [    2.368651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.370580] CR2: 000055a44841a000 CR3: 000000025e96c001 CR4:
> 00000000003606e0
> [    2.372490] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [    2.374359] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [    2.375888] Call Trace:
> [    2.377577]  blk_mq_pci_map_queues+0x2d/0xb0
> [    2.379473]  nvme_pci_map_queues+0x80/0xb0 [nvme]
> [    2.381375]  blk_mq_alloc_tag_set+0x118/0x2c0
> [    2.381380]  nvme_reset_work+0x13ca/0x1882 [nvme]
> [    2.384415]  process_one_work+0x19d/0x380
> [    2.384417]  worker_thread+0x1db/0x3b0
> [    2.387711]  kthread+0xfb/0x130
> [    2.387715]  ? process_one_work+0x380/0x380
> [    2.391100]  ? kthread_park+0x90/0x90
> [    2.391103]  ret_from_fork+0x35/0x40
> [    2.393794] ---[ end trace a39c0e715c7be61d ]---
> [    2.399294]  nvme0n1: p1 p2 p3 p4 p5 p6
> [    2.472772] usb 1-3.3: New USB device found, idVendor=05ac,
> idProduct=828f, bcdDevice= 1.12
> [    2.474186] usb 1-3.3: New USB device strings: Mfr=1, Product=2,
> SerialNumber=0
> [    2.475734] usb 1-3.3: Product: Bluetooth USB Host Controller
> [    2.477272] usb 1-3.3: Manufacturer: Apple Inc.

Not see this issue with 5.1 kernel, may be addressed by the following
patches:

4e6b26d23dc1 PCI/MSI: Remove obsolete sanity checks for multiple interrupt sets
a6a309edba13 genirq/affinity: Remove the leftovers of the original set support
612b72862b4d nvme-pci: Simplify interrupt allocation
c66d4bd110a1 genirq/affinity: Add new callback for (re)calculating
interrupt sets
9cfef55bb57e genirq/affinity: Store interrupt sets size in struct irq_affinity
0145c30e896d genirq/affinity: Code consolidation



Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 10+ messages in thread

* nvme drive kernel 5.0 problem
  2019-05-16  3:57 ` Ming Lei
@ 2019-05-16  5:50   ` Iván Chavero
  2019-05-20 22:12   ` Iván Chavero
  1 sibling, 0 replies; 10+ messages in thread
From: Iván Chavero @ 2019-05-16  5:50 UTC (permalink / raw)


Thanks!

I'll test the patches or update to 5.1 to see if gets fixed

> Not see this issue with 5.1 kernel, may be addressed by the following
> patches:
> 
> 4e6b26d23dc1 PCI/MSI: Remove obsolete sanity checks for multiple
> interrupt sets
> a6a309edba13 genirq/affinity: Remove the leftovers of the original
> set support
> 612b72862b4d nvme-pci: Simplify interrupt allocation
> c66d4bd110a1 genirq/affinity: Add new callback for (re)calculating
> interrupt sets
> 9cfef55bb57e genirq/affinity: Store interrupt sets size in struct
> irq_affinity
> 0145c30e896d genirq/affinity: Code consolidation
> 
> 
> 
> Thanks,
> Ming Lei

Cheers,
Iv?n

^ permalink raw reply	[flat|nested] 10+ messages in thread

* nvme drive kernel 5.0 problem
  2019-05-16  3:57 ` Ming Lei
  2019-05-16  5:50   ` Iván Chavero
@ 2019-05-20 22:12   ` Iván Chavero
  2019-05-21 14:20     ` Keith Busch
  1 sibling, 1 reply; 10+ messages in thread
From: Iván Chavero @ 2019-05-20 22:12 UTC (permalink / raw)



> Not see this issue with 5.1 kernel, may be addressed by the following
> patches:
> 
> 4e6b26d23dc1 PCI/MSI: Remove obsolete sanity checks for multiple
> interrupt sets
> a6a309edba13 genirq/affinity: Remove the leftovers of the original
> set support
> 612b72862b4d nvme-pci: Simplify interrupt allocation
> c66d4bd110a1 genirq/affinity: Add new callback for (re)calculating
> interrupt sets
> 9cfef55bb57e genirq/affinity: Store interrupt sets size in struct
> irq_affinity
> 0145c30e896d genirq/affinity: Code consolidation
> 
> 
i've tested with the 5.1.3 Fedora kernel and still got the same
behaviour.

I think this might be relevant to solve the problem but i'm not sure:

[    2.394967] Workqueue: nvme-reset-wq nvme_reset_work [nvme]

[    2.394982] Call Trace:
[    2.394986]  blk_mq_pci_map_queues+0x30/0xc0
[    2.394990]  nvme_pci_map_queues+0x80/0xb0 [nvme]
[    2.394993]  blk_mq_alloc_tag_set+0x11c/0x2c0
[    2.394996]  nvme_reset_work+0xfd6/0x1515 [nvme]
[    2.395000]  ? __switch_to_asm+0x40/0x70
[    2.395001]  ? __switch_to_asm+0x34/0x70
[    2.395003]  ? __switch_to_asm+0x40/0x70
[    2.395005]  ? __switch_to_asm+0x34/0x70
[    2.395007]  process_one_work+0x19d/0x380
[    2.395010]  worker_thread+0x1db/0x3b0
[    2.395011]  kthread+0xfb/0x130
[    2.395013]  ? process_one_work+0x380/0x380
[    2.395014]  ? kthread_park+0x90/0x90
[    2.395016]  ret_from_fork+0x35/0x40
[    2.395018] ---[ end trace 3af2b3afa977ff9e ]---


I think this is a timing problem because the other partitions don't get
mounted ro.

What could i do to make this work? I'm stuck in kernel 4.16.11 and i
would really like to use latest kernel.

Cheers,
Iv?n

^ permalink raw reply	[flat|nested] 10+ messages in thread

* nvme drive kernel 5.0 problem
  2019-05-20 22:12   ` Iván Chavero
@ 2019-05-21 14:20     ` Keith Busch
  2019-05-21 21:34       ` Iván Chavero
  0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2019-05-21 14:20 UTC (permalink / raw)


On Mon, May 20, 2019@05:12:46PM -0500, Iv?n Chavero wrote:
> > Not see this issue with 5.1 kernel, may be addressed by the following
> > patches:
> > 
> > 4e6b26d23dc1 PCI/MSI: Remove obsolete sanity checks for multiple
> > interrupt sets
> > a6a309edba13 genirq/affinity: Remove the leftovers of the original
> > set support
> > 612b72862b4d nvme-pci: Simplify interrupt allocation
> > c66d4bd110a1 genirq/affinity: Add new callback for (re)calculating
> > interrupt sets
> > 9cfef55bb57e genirq/affinity: Store interrupt sets size in struct
> > irq_affinity
> > 0145c30e896d genirq/affinity: Code consolidation
> > 
> > 
> i've tested with the 5.1.3 Fedora kernel and still got the same
> behaviour.
> 
> I think this might be relevant to solve the problem but i'm not sure:
> 
> [    2.394967] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
> 
> [    2.394982] Call Trace:
> [    2.394986]  blk_mq_pci_map_queues+0x30/0xc0
> [    2.394990]  nvme_pci_map_queues+0x80/0xb0 [nvme]
> [    2.394993]  blk_mq_alloc_tag_set+0x11c/0x2c0
> [    2.394996]  nvme_reset_work+0xfd6/0x1515 [nvme]
> [    2.395000]  ? __switch_to_asm+0x40/0x70
> [    2.395001]  ? __switch_to_asm+0x34/0x70
> [    2.395003]  ? __switch_to_asm+0x40/0x70
> [    2.395005]  ? __switch_to_asm+0x34/0x70
> [    2.395007]  process_one_work+0x19d/0x380
> [    2.395010]  worker_thread+0x1db/0x3b0
> [    2.395011]  kthread+0xfb/0x130
> [    2.395013]  ? process_one_work+0x380/0x380
> [    2.395014]  ? kthread_park+0x90/0x90
> [    2.395016]  ret_from_fork+0x35/0x40
> [    2.395018] ---[ end trace 3af2b3afa977ff9e ]---
> 
> 
> I think this is a timing problem because the other partitions don't get
> mounted ro.
> 
> What could i do to make this work? I'm stuck in kernel 4.16.11 and i
> would really like to use latest kernel.

The warning in itself is not necessarily fatal and may not explain why
the filesystem is having issues. It should just mean that you've only
a single MSI interrupt vector sharing with the admin queue, so it's an
unmanagged vector, creating this warning. It should otherwise be usable.

The following should work around the warning assuming the vector count
is really what's creating your warning: managed irqs should always have
an offset, so no offset should mean no pci irq affinity.

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2a8708c9ac18..d55e1d92cf59 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -464,7 +464,7 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 		 * affinity), so use the regular blk-mq cpu mapping
 		 */
 		map->queue_offset = qoff;
-		if (i != HCTX_TYPE_POLL)
+		if (i != HCTX_TYPE_POLL && offset)
 			blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), offset);
 		else
 			blk_mq_map_queues(map);
--

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* nvme drive kernel 5.0 problem
  2019-05-21 14:20     ` Keith Busch
@ 2019-05-21 21:34       ` Iván Chavero
  2019-05-21 22:00         ` Keith Busch
  0 siblings, 1 reply; 10+ messages in thread
From: Iván Chavero @ 2019-05-21 21:34 UTC (permalink / raw)


El 21/05/19 a las 9:20 a. m., Keith Busch escribi?:
>
> The warning in itself is not necessarily fatal and may not explain why
> the filesystem is having issues. It should just mean that you've only
> a single MSI interrupt vector sharing with the admin queue, so it's an
> unmanagged vector, creating this warning. It should otherwise be usable.
>
> The following should work around the warning assuming the vector count
> is really what's creating your warning: managed irqs should always have
> an offset, so no offset should mean no pci irq affinity.
>
> ---
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 2a8708c9ac18..d55e1d92cf59 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -464,7 +464,7 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
>  		 * affinity), so use the regular blk-mq cpu mapping
>  		 */
>  		map->queue_offset = qoff;
> -		if (i != HCTX_TYPE_POLL)
> +		if (i != HCTX_TYPE_POLL && offset)
>  			blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), offset);
>  		else
>  			blk_mq_map_queues(map);
> --
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

I'm about to test this code but i wanted to point out that i've found also

this errors before the ext4 driver remounts the filesystem as read-only:


[?? 11.304823] print_req_error: operation not supported error, dev
nvme0n1, sector 386138112 flags 4801
[?? 11.305292] print_req_error: operation not supported error, dev
nvme0n1, sector 386139128 flags 4801
[?? 11.307096] print_req_error: operation not supported error, dev
nvme0n1, sector 386140144 flags 4801
[?? 11.307194] print_req_error: operation not supported error, dev
nvme0n1, sector 386141160 flags 4801
[?? 11.307433] print_req_error: operation not supported error, dev
nvme0n1, sector 386142176 flags 4801
[?? 11.307818] print_req_error: operation not supported error, dev
nvme0n1, sector 386143192 flags 4801
[?? 11.307939] print_req_error: operation not supported error, dev
nvme0n1, sector 386144208 flags 4801
[?? 11.308047] print_req_error: operation not supported error, dev
nvme0n1, sector 386145224 flags 4801
[?? 11.308134] print_req_error: operation not supported error, dev
nvme0n1, sector 386146240 flags 4801
[?? 11.308255] print_req_error: operation not supported error, dev
nvme0n1, sector 386147256 flags 4801
[?? 11.309149] EXT4-fs warning (device dm-1): ext4_end_bio:318: I/O
error 1 writing to inode 393675 (offset 0 size 8388608 starting block 98288)
[?? 11.309152] Buffer I/O error on device dm-1, logical block 96256
[?? 11.309158] Buffer I/O error on device dm-1, logical block 96257
[?? 11.309159] Buffer I/O error on device dm-1, logical block 96258
[?? 11.309163] Buffer I/O error on device dm-1, logical block 96259
[?? 11.309164] Buffer I/O error on device dm-1, logical block 96260
[?? 11.309167] Buffer I/O error on device dm-1, logical block 96261
[?? 11.309169] Buffer I/O error on device dm-1, logical block 96262
[?? 11.309172] Buffer I/O error on device dm-1, logical block 96263
[?? 11.309174] Buffer I/O error on device dm-1, logical block 96264
[?? 11.309177] Buffer I/O error on device dm-1, logical block 96265
[?? 11.311152] EXT4-fs warning (device dm-1): ext4_end_bio:318: I/O
error 1 writing to inode 393675 (offset 8388608 size 1343488 starting
block 487678)
[?? 11.383082] EXT4-fs warning (device dm-1): ext4_end_bio:318: I/O
error 1 writing to inode 1966094 (offset 0 size 0 starting block 467904)
[?? 11.400545] EXT4-fs warning (device dm-1): ext4_end_bio:318: I/O
error 1 writing to inode 1966094 (offset 57806848 size 323584 starting
block 5543713)
[

...

[?? 12.525069] EXT4-fs warning (device dm-1): ext4_end_bio:318: I/O
error 1 writing to inode 393675 (offset 0 size 8388608 starting block
491504)
[?? 12.526452] EXT4-fs warning (device dm-1): ext4_end_bio:318: I/O
error 1 writing to inode 393675 (offset 8388608 size 1343488 starting
block 475390)
[?? 13.464615] EXT4-fs (dm-3): mounted filesystem with ordered data
mode. Opts: (null)
[?? 13.468795] EXT4-fs (nvme0n1p5): mounted filesystem with ordered data
mode. Opts: (null)

...

[?? 13.756080] Aborting journal on device dm-1-8.
[?? 13.756611] EXT4-fs error (device dm-1): ext4_journal_check_start:61:
Detected aborted journal
[?? 13.756616] EXT4-fs (dm-1): Remounting filesystem read-only

I think that the real error is this one:

print_req_error: operation not supported error, dev nvme0n1, sector
386138112 flags 4801

but i'm not sure how to interpret this error, my guess is that is a
syncronization issue.


I should point out that i also have other partitions on this device that
get mounted read-write with no problem.


The partition that has the problem is the root partition so my guess is
that its mounted? before

the driver finishes some syncronization or intialization tasks causing
the "operation not supported error"

then, the ext4 driver after some errors protects the partition and
mounts it read-only.


It might be a problem of tuning the nvme driver parameters, since i
could not find any parameters for

the apple device i took the ones that get set automatically on kernel
4.16.11:


options $kernelopts nvme.poll_queues=2 nvme.io_queue_depth=1024
nvme.max_host_mem_size_mb=128 nvme.sgl_threshold=32768
nvme.use_cmb_sqes=Y nvme_core.default_ps_max_latency_us=100000
nvme_core.force_apst=Y nvme_core.io_timeout=30 nvme_core.max_retries=5
nvme_core.shutdown_timeout=5 nvme_core.streams=N?
systemd.unit=multi-user.target


Thanks for the help.


-- 
Iv?n Chavero
Hacker

^ permalink raw reply	[flat|nested] 10+ messages in thread

* nvme drive kernel 5.0 problem
  2019-05-21 21:34       ` Iván Chavero
@ 2019-05-21 22:00         ` Keith Busch
  2019-05-22  0:50           ` Iván Chavero
       [not found]           ` <CGME20190522005114epcas4p2cae7e26ab679da2e44a50b803724a336@epcms2p7>
  0 siblings, 2 replies; 10+ messages in thread
From: Keith Busch @ 2019-05-21 22:00 UTC (permalink / raw)


On Tue, May 21, 2019@04:34:44PM -0500, Iv?n Chavero wrote:
> I think that the real error is this one:
> 
> print_req_error: operation not supported error, dev nvme0n1, sector
> 386138112 flags 4801

Weird. There are not very many statuses that translate to a "not
supported" from nvme, and none of them really make sense for a device
to return only during early boot. Let's see if that error is coming
from the drive or somewhere else.

Could you append the following kernel parameter:

  trace_event=nvme_setup_cmd,nvme_complete_rq

Then get the output from after the errors?

  cat /sys/kernel/debug/tracing/trace

^ permalink raw reply	[flat|nested] 10+ messages in thread

* nvme drive kernel 5.0 problem
  2019-05-21 22:00         ` Keith Busch
@ 2019-05-22  0:50           ` Iván Chavero
       [not found]           ` <CGME20190522005114epcas4p2cae7e26ab679da2e44a50b803724a336@epcms2p7>
  1 sibling, 0 replies; 10+ messages in thread
From: Iván Chavero @ 2019-05-22  0:50 UTC (permalink / raw)


El mar, 21-05-2019 a las 16:00 -0600, Keith Busch escribi?:
> On Tue, May 21, 2019@04:34:44PM -0500, Iv?n Chavero wrote:
> > I think that the real error is this one:
> > 
> > print_req_error: operation not supported error, dev nvme0n1, sector
> > 386138112 flags 4801
> 
> Weird. There are not very many statuses that translate to a "not
> supported" from nvme, and none of them really make sense for a device
> to return only during early boot. Let's see if that error is coming
> from the drive or somewhere else.
> 
> Could you append the following kernel parameter:
> 
>   trace_event=nvme_setup_cmd,nvme_complete_rq
> 
> Then get the output from after the errors?
> 
>   cat /sys/kernel/debug/tracing/trac


I tried setting: 

trace_event=nvme_setup_cmd,nvme_complete_rq

but got this error:

[    0.284016] Failed to enable trace event: nvme_setup_cmd
[    0.284043] Failed to enable trace event: nvme_complete_rq

and also 

trace_event=nvme:*

and got this:

[    0.283308] Failed to enable trace event: nvme:*

what else could i check to see what's the problem??

thanks

^ permalink raw reply	[flat|nested] 10+ messages in thread

* nvme drive kernel 5.0 problem
       [not found]           ` <CGME20190522005114epcas4p2cae7e26ab679da2e44a50b803724a336@epcms2p7>
@ 2019-05-22  1:53             ` Minwoo Im
  2019-06-29 16:56               ` Iván Chavero
  0 siblings, 1 reply; 10+ messages in thread
From: Minwoo Im @ 2019-05-22  1:53 UTC (permalink / raw)


> I tried setting:
> 
> trace_event=nvme_setup_cmd,nvme_complete_rq
> 
> but got this error:
> 
> [    0.284016] Failed to enable trace event: nvme_setup_cmd
> [    0.284043] Failed to enable trace event: nvme_complete_rq

IMHO, maybe you just can make nvme_core module to be in kernel image
itself and try it again.  It looks like enabling trace_event has occurred before
the module is loaded onto kernel.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* nvme drive kernel 5.0 problem
  2019-05-22  1:53             ` Minwoo Im
@ 2019-06-29 16:56               ` Iván Chavero
  0 siblings, 0 replies; 10+ messages in thread
From: Iván Chavero @ 2019-06-29 16:56 UTC (permalink / raw)


El mi?, 22-05-2019 a las 10:53 +0900, Minwoo Im escribi?:
> > I tried setting:
> > 
> > trace_event=nvme_setup_cmd,nvme_complete_rq
> > 
> > but got this error:
> > 
> > [    0.284016] Failed to enable trace event: nvme_setup_cmd
> > [    0.284043] Failed to enable trace event: nvme_complete_rq
> 
> IMHO, maybe you just can make nvme_core module to be in kernel image
> itself and try it again.  It looks like enabling trace_event has
> occurred before
> the module is loaded onto kernel.
> 
I've compiled the nvme_core driver inside the kernel an i can se the
trace events, i don't really understand it but it does not say anything
that could resemble an error, i'm pasting examples of the two different
outputs i've identified:

         <idle>-0     [000] ..s.   409.334792: nvme_complete_rq: nvme0:
disk=nvme0n1, qid=1, cmdid=0, res=0, retries=0, flags=0x0, status=0
 dmcrypt_write/2-629   [002] ....   409.334808: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=1, cmdid=0, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=23649280, len=0, ctrl=0x0, dsmgmt=0, reftag=0)


...


          <idle>-0     [000] ..s.   409.334737: nvme_complete_rq:
nvme0: disk=nvme0n1, qid=1, cmdid=0, res=0, retries=0, flags=0x0,
status=0
    kworker/2:1H-165   [002] ....   409.334759: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=1, cmdid=0, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=23136987, len=0, ctrl=0x0, dsmgmt=0, reftag=0)
          <idle>-0     [000] ..s.   409.334792: nvme_complete_rq:
nvme0: disk=nvme0n1, qid=1, cmdid=0, res=0, retries=0, flags=0x0,
status=0
 dmcrypt_write/2-629   [002] ....   409.334808: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=1, cmdid=0, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=23649280, len=0, ctrl=0x0, dsmgmt=0, reftag=0)
          <idle>-0     [000] ..s.   409.334837: nvme_complete_rq:
nvme0: disk=nvme0n1, qid=1, cmdid=0, res=0, retries=0, flags=0x0,
status=0


i hope this can be helpful BTW, i'm testing with kernel 5.1.12

Thanks for the help
Iv?n

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-06-29 16:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16  0:20 nvme drive kernel 5.0 problem Iván Chavero
2019-05-16  3:57 ` Ming Lei
2019-05-16  5:50   ` Iván Chavero
2019-05-20 22:12   ` Iván Chavero
2019-05-21 14:20     ` Keith Busch
2019-05-21 21:34       ` Iván Chavero
2019-05-21 22:00         ` Keith Busch
2019-05-22  0:50           ` Iván Chavero
     [not found]           ` <CGME20190522005114epcas4p2cae7e26ab679da2e44a50b803724a336@epcms2p7>
2019-05-22  1:53             ` Minwoo Im
2019-06-29 16:56               ` Iván Chavero

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.