All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel BUG at nvme/host/pci.c
@ 2017-07-10 18:03 Andreas Pflug
  2017-07-10 19:08 ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Pflug @ 2017-07-10 18:03 UTC (permalink / raw)


I'm running a patched (see below) debian 4.9.30 kernel with xen4.8.1 on 
Debian9. Starting a specific virtual machine, very soon the kernel will emit

     kernel BUG at /usr/src/kernel/linux-4.9.30/drivers/nvme/host/pci.c:495!

via netconsole to my logging host, and become unstable until hard reset. 
Hardware is dual E5-2620v4 on Supermicro 10DRI-T with two SAMSUNG 
MZQLW960HMJP-00003 NVME disks (mdadm RAID-1) backing the vhds (os on 
separate SSD).

The bug was reported to debian as https://bugs.debian.org/866511 . 
According to Ben Hutchings' advice, I patched the standard kernel with 
0001-swiotlb-ensure-that-page-sized-mappings-are-page-ali.patch since 
its description sounded promising, but the bug remains.

Log is attached, cut after 460 lines: the last trace on CPU15 is 
repeated all over again, eventually leading to "Fixing recursive fault 
but reboot is needed!"

Regards,
Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xen2-kernel.log
Type: text/x-log
Size: 17073 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20170710/e8cb46ce/attachment.bin>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* kernel BUG at nvme/host/pci.c
  2017-07-10 18:03 kernel BUG at nvme/host/pci.c Andreas Pflug
@ 2017-07-10 19:08 ` Keith Busch
  2017-07-11  7:44   ` Andreas Pflug
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2017-07-10 19:08 UTC (permalink / raw)


On Mon, Jul 10, 2017@08:03:16PM +0200, Andreas Pflug wrote:
> I'm running a patched (see below) debian 4.9.30 kernel with xen4.8.1 on
> Debian9. Starting a specific virtual machine, very soon the kernel will emit
> 
>     kernel BUG at /usr/src/kernel/linux-4.9.30/drivers/nvme/host/pci.c:495!
> 
> via netconsole to my logging host, and become unstable until hard reset.
> Hardware is dual E5-2620v4 on Supermicro 10DRI-T with two SAMSUNG
> MZQLW960HMJP-00003 NVME disks (mdadm RAID-1) backing the vhds (os on
> separate SSD).
> 
> The bug was reported to debian as https://bugs.debian.org/866511 . According
> to Ben Hutchings' advice, I patched the standard kernel with
> 0001-swiotlb-ensure-that-page-sized-mappings-are-page-ali.patch since its
> description sounded promising, but the bug remains.

The BUG_ON means the nvme driver was given a scatter list that is invalid
for the constraints the NVMe device was registered with. There have been
issues in the past when NVMe is used with stacking devices like RAID,
but I think they are all resolved. Would you happen to know if this
is successful with the 4.12 kernel? If so, I might be able to find the
patch(es) for 4.9-stable, otherwise we'll need to fix it there first.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* kernel BUG at nvme/host/pci.c
  2017-07-10 19:08 ` Keith Busch
@ 2017-07-11  7:44   ` Andreas Pflug
  2017-07-11 19:45     ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Pflug @ 2017-07-11  7:44 UTC (permalink / raw)


Am 10.07.17 um 21:08 schrieb Keith Busch:
> On Mon, Jul 10, 2017@08:03:16PM +0200, Andreas Pflug wrote:
>> I'm running a patched (see below) debian 4.9.30 kernel with xen4.8.1 on
>> Debian9. Starting a specific virtual machine, very soon the kernel will emit
>>
>>     kernel BUG at /usr/src/kernel/linux-4.9.30/drivers/nvme/host/pci.c:495!
>>
>> via netconsole to my logging host, and become unstable until hard reset.
>> Hardware is dual E5-2620v4 on Supermicro 10DRI-T with two SAMSUNG
>> MZQLW960HMJP-00003 NVME disks (mdadm RAID-1) backing the vhds (os on
>> separate SSD).
>>
>> The bug was reported to debian as https://bugs.debian.org/866511 . According
>> to Ben Hutchings' advice, I patched the standard kernel with
>> 0001-swiotlb-ensure-that-page-sized-mappings-are-page-ali.patch since its
>> description sounded promising, but the bug remains.
> The BUG_ON means the nvme driver was given a scatter list that is invalid
> for the constraints the NVMe device was registered with. There have been
> issues in the past when NVMe is used with stacking devices like RAID,
> but I think they are all resolved. Would you happen to know if this
> is successful with the 4.12 kernel? If so, I might be able to find the
> patch(es) for 4.9-stable, otherwise we'll need to fix it there first.

Tested with 4.12.0, result is

  kernel BUG at drivers/nvme/host/pci.c:610!

Kernel seems to recover from that, but I did a reboot anyway.

Log file attached.


Regards,

Andreas

-------------- next part --------------
Jul 11 09:37:28 xen2 [  110.002253] ------------[ cut here ]------------
Jul 11 09:37:28 xen2 [  110.002310] kernel BUG at drivers/nvme/host/pci.c:610!
Jul 11 09:37:28 xen2 [  110.002336] invalid opcode: 0000 [#1] SMP
Jul 11 09:37:28 xen2 [  110.002357] Modules linked in: xt_physdev br_netfilter iptable_filter xen_netback xen_blkback netconsole configfs bridge xen_gntdev xen_evtchn xenfs xen_privcmd dm_snapshot dm_bufio intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 iTCO_wdt crypto_simd iTCO_vendor_support glue_helper mxm_wmi cryptd snd_pcm snd_timer snd soundcore intel_rapl_perf pcspkr ast ttm e1000e drm_kms_helper joydev i2c_i801 ixgbe mei_me nvme drm ehci_pci ptp lpc_ich i2c_algo_bit sg mfd_core ehci_hcd mei pps_core nvme_core mdio ioatdma shpchp dca wmi acpi_power_meter 8021q garp mrp stp llc button ipmi_si ipmi_devintf ipmi_msghandler drbd lru_cache sunrpc ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache raid10 raid456 libcrc32c
Jul 11 09:37:28 xen2 [  110.002638]  crc32c_generic async_raid6_recov async_memcpy async_pq async_xor xor async_tx evdev hid_generic usbhid hid raid6_pq raid0 multipath linear bcache dm_mod raid1 md_mod sd_mod crc32c_intel ahci libahci xhci_pci xhci_hcd libata usbcore scsi_mod
Jul 11 09:37:28 xen2 [  110.002746] CPU: 0 PID: 5522 Comm: 2.hda-0 Tainted: G        W       4.12.0pse #2
Jul 11 09:37:28 xen2 [  110.002775] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 2.1 09/13/2016
Jul 11 09:37:28 xen2 [  110.002807] task: ffff88015fb3e140 task.stack: ffffc90047b64000
Jul 11 09:37:28 xen2 [  110.002838] RIP: e030:nvme_queue_rq+0x644/0x7c0 [nvme]
Jul 11 09:37:28 xen2 [  110.002864] RSP: e02b:ffffc90047b67a10 EFLAGS: 00010286
Jul 11 09:37:28 xen2 [  110.002889] RAX: 0000000000000008 RBX: 00000000fffff400 RCX: 0000000000001000
Jul 11 09:37:28 xen2 [  110.002922] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000200
Jul 11 09:37:28 xen2 [  110.002954] RBP: 0000000000711000 R08: 0000000000001400 R09: ffff880171a82a00
Jul 11 09:37:28 xen2 [  110.002987] R10: 0000000000001000 R11: ffff880161316d00 R12: 0000000000006000
Jul 11 09:37:28 xen2 [  110.003019] R13: 0000000000000200 R14: ffff880161316d00 R15: 0000000000000002
Jul 11 09:37:28 xen2 [  110.003056] FS:  0000000000000000(0000) GS:ffff880186a00000(0000) knlGS:ffff880186a00000
Jul 11 09:37:28 xen2 [  110.003088] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 11 09:37:28 xen2 [  110.003115] CR2: 00007fed265e5fe8 CR3: 000000016eec0000 CR4: 0000000000042660
Jul 11 09:37:28 xen2 [  110.003148] Call Trace:
Jul 11 09:37:28 xen2 [  110.003169]  ? blk_mq_dispatch_rq_list+0x201/0x400
Jul 11 09:37:28 xen2 [  110.003193]  ? blk_mq_flush_busy_ctxs+0xc1/0x120
Jul 11 09:37:28 xen2 [  110.003217]  ? blk_mq_sched_dispatch_requests+0x1b1/0x1e0
Jul 11 09:37:28 xen2 [  110.003243]  ? __blk_mq_delay_run_hw_queue+0x91/0xa0
Jul 11 09:37:28 xen2 [  110.003265]  ? blk_mq_flush_plug_list+0x184/0x260
Jul 11 09:37:28 xen2 [  110.003290]  ? blk_flush_plug_list+0xf2/0x280
Jul 11 09:37:28 xen2 [  110.003312]  ? blk_finish_plug+0x27/0x40
Jul 11 09:37:28 xen2 [  110.003335]  ? dispatch_rw_block_io+0x732/0x9c0 [xen_blkback]
Jul 11 09:37:28 xen2 [  110.003363]  ? __do_block_io_op+0x362/0x690 [xen_blkback]
Jul 11 09:37:28 xen2 [  110.003393]  ? _raw_spin_unlock_irqrestore+0x16/0x20
Jul 11 09:37:28 xen2 [  110.003415]  ? __do_block_io_op+0x362/0x690 [xen_blkback]
Jul 11 09:37:28 xen2 [  110.003442]  ? xen_blkif_schedule+0x116/0x7f0 [xen_blkback]
Jul 11 09:37:28 xen2 [  110.003469]  ? __schedule+0x3cd/0x850
Jul 11 09:37:28 xen2 [  110.003488]  ? remove_wait_queue+0x60/0x60
Jul 11 09:37:28 xen2 [  110.003511]  ? kthread+0xfc/0x130
Jul 11 09:37:28 xen2 [  110.003530]  ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
Jul 11 09:37:28 xen2 [  110.003556]  ? kthread_create_on_node+0x70/0x70
Jul 11 09:37:28 xen2 [  110.003581]  ? do_group_exit+0x3a/0xa0
Jul 11 09:37:28 xen2 [  110.004573]  ? ret_from_fork+0x25/0x30
Jul 11 09:37:28 xen2 [  110.005560] Code: ff 4c 89 ef 89 54 24 20 89 4c 24 18 e8 66 e0 e9 c0 8b 54 24 20 48 89 44 24 10 4c 8b 48 10 44 8b 40 18 8b 4c 24 18 e9 74 fd ff ff <0f> 0b 49 8b 77 68 48 8b 3c 24 e8 8d b3 e8 c0 83 e8 01 74 55 41 
Jul 11 09:37:28 xen2 [  110.007650] RIP: nvme_queue_rq+0x644/0x7c0 [nvme] RSP: ffffc90047b67a10
Jul 11 09:37:28 xen2 [  110.008708] ---[ end trace ad956c9e07e27784 ]---
Jul 11 09:37:28 xen2 [  110.009061] systemd-journald[413]: Compressed data object 809 -> 751 using LZ4
Jul 11 09:37:32 xen2 [  113.693382] BUG: unable to handle kernel paging request at 0000010000030000
Jul 11 09:37:32 xen2 [  113.694614] IP: __list_add_valid+0xc/0x70
Jul 11 09:37:32 xen2 [  113.695634] PGD 0 
Jul 11 09:37:32 xen2 [  113.695635] P4D 0 
Jul 11 09:37:32 xen2 [  113.696613] 
Jul 11 09:37:32 xen2 [  113.698441] Oops: 0000 [#2] SMP
Jul 11 09:37:32 xen2 [  113.699307] Modules linked in: xt_physdev br_netfilter iptable_filter xen_netback xen_blkback netconsole configfs bridge xen_gntdev xen_evtchn xenfs xen_privcmd dm_snapshot dm_bufio intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 iTCO_wdt crypto_simd iTCO_vendor_support glue_helper mxm_wmi cryptd snd_pcm snd_timer snd soundcore intel_rapl_perf pcspkr ast ttm e1000e drm_kms_helper joydev i2c_i801 ixgbe mei_me nvme drm ehci_pci ptp lpc_ich i2c_algo_bit sg mfd_core ehci_hcd mei pps_core nvme_core mdio ioatdma shpchp dca wmi acpi_power_meter 8021q garp mrp stp llc button ipmi_si ipmi_devintf ipmi_msghandler drbd lru_cache sunrpc ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache raid10 raid456 libcrc32c
Jul 11 09:37:32 xen2 [  113.705697]  crc32c_generic async_raid6_recov async_memcpy async_pq async_xor xor async_tx evdev hid_generic usbhid hid raid6_pq raid0 multipath linear bcache dm_mod raid1 md_mod sd_mod crc32c_intel ahci libahci xhci_pci xhci_hcd libata usbcore scsi_mod
Jul 11 09:37:32 xen2 [  113.707697] CPU: 11 PID: 106 Comm: xenwatch Tainted: G      D W       4.12.0pse #2
Jul 11 09:37:32 xen2 [  113.708720] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 2.1 09/13/2016
Jul 11 09:37:32 xen2 [  113.709754] task: ffff88017be3b040 task.stack: ffffc900466c4000
Jul 11 09:37:32 xen2 [  113.710794] RIP: e030:__list_add_valid+0xc/0x70
Jul 11 09:37:32 xen2 [  113.711838] RSP: e02b:ffffc900466c7c78 EFLAGS: 00010046
Jul 11 09:37:32 xen2 [  113.712890] RAX: ffff88016741bb48 RBX: ffff88016741bb40 RCX: 0000000000000000
Jul 11 09:37:32 xen2 [  113.713936] RDX: ffff88016741bb48 RSI: 0000010000030000 RDI: ffffc900466c7c98
Jul 11 09:37:32 xen2 [  113.714977] RBP: 0000010000030000 R08: 0000010000030000 R09: 0000000000000000
Jul 11 09:37:32 xen2 [  113.716015] R10: ffffc900466c7d50 R11: ffffffff81f333e0 R12: ffff88016741bb38
Jul 11 09:37:32 xen2 [  113.717055] R13: ffffc900466c7c98 R14: ffff88016741bb48 R15: ffff88017be90f38
Jul 11 09:37:32 xen2 [  113.718100] FS:  0000000000000000(0000) GS:ffff880186cc0000(0000) knlGS:ffff880186cc0000
Jul 11 09:37:32 xen2 [  113.719157] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 11 09:37:32 xen2 [  113.720213] CR2: 0000010000030000 CR3: 000000015d554000 CR4: 0000000000042660
Jul 11 09:37:32 xen2 [  113.721283] Call Trace:
Jul 11 09:37:32 xen2 [  113.722350]  ? wait_for_completion+0xd1/0x190
Jul 11 09:37:32 xen2 [  113.723429]  ? wake_up_q+0x70/0x70
Jul 11 09:37:32 xen2 [  113.724497]  ? kthread_stop+0x43/0xf0
Jul 11 09:37:32 xen2 [  113.725581]  ? xen_blkif_disconnect+0x62/0x290 [xen_blkback]
Jul 11 09:37:32 xen2 [  113.726655]  ? xen_blkbk_remove+0x59/0xf0 [xen_blkback]
Jul 11 09:37:32 xen2 [  113.727724]  ? xenbus_dev_remove+0x4c/0xa0
Jul 11 09:37:32 xen2 [  113.728633]  ? device_release_driver_internal+0x154/0x210
Jul 11 09:37:32 xen2 [  113.729546]  ? bus_remove_device+0xf5/0x160
Jul 11 09:37:32 xen2 [  113.730461]  ? device_del+0x1cc/0x300
Jul 11 09:37:32 xen2 [  113.731526]  ? device_unregister+0x16/0x60
Jul 11 09:37:32 xen2 [  113.732436]  ? frontend_changed+0x9d/0x580 [xen_blkback]
Jul 11 09:37:32 xen2 [  113.733503]  ? xenbus_read_driver_state+0x39/0x60
Jul 11 09:37:32 xen2 [  113.734572]  ? prepare_to_wait_event+0x7a/0x150
Jul 11 09:37:32 xen2 [  113.735648]  ? xenwatch_thread+0xb7/0x150
Jul 11 09:37:32 xen2 [  113.736697]  ? remove_wait_queue+0x60/0x60
Jul 11 09:37:32 xen2 [  113.737721]  ? kthread+0xfc/0x130
Jul 11 09:37:32 xen2 [  113.738728]  ? find_watch+0x40/0x40
Jul 11 09:37:32 xen2 [  113.739723]  ? kthread_create_on_node+0x70/0x70
Jul 11 09:37:32 xen2 [  113.740569]  ? ret_from_fork+0x25/0x30
Jul 11 09:37:32 xen2 [  113.741542] Code: ff ff 48 89 e8 4c 8b 6c 24 10 48 83 c8 01 e9 0d ff ff ff e8 87 f9 d0 ff 0f 1f 80 00 00 00 00 4c 8b 42 08 48 89 d0 49 39 f0 75 18 <49> 8b 10 48 39 d0 75 27 49 39 f8 74 39 48 39 f8 74 34 b8 01 00 
Jul 11 09:37:32 xen2 [  113.743589] RIP: __list_add_valid+0xc/0x70 RSP: ffffc900466c7c78
Jul 11 09:37:32 xen2 [  113.744631] CR2: 0000010000030000
Jul 11 09:37:32 xen2 [  113.745682] ---[ end trace ad956c9e07e27785 ]---

^ permalink raw reply	[flat|nested] 16+ messages in thread

* kernel BUG at nvme/host/pci.c
  2017-07-11 19:45     ` Keith Busch
@ 2017-07-11 19:44       ` Scott Bauer
  2017-07-12  6:06       ` Andreas Pflug
  1 sibling, 0 replies; 16+ messages in thread
From: Scott Bauer @ 2017-07-11 19:44 UTC (permalink / raw)


On Tue, Jul 11, 2017@03:45:24PM -0400, Keith Busch wrote:
> On Tue, Jul 11, 2017@09:44:47AM +0200, Andreas Pflug wrote:
> > Tested with 4.12.0, result is
> > 
> >   kernel BUG at drivers/nvme/host/pci.c:610!
> > 
> > Kernel seems to recover from that, but I did a reboot anyway.
>  
> Ugh, still observing invalid scatter lists on 4.12. Definitely recommend
> rebooting after hitting this.
> 
> There should only be two possibilities: either the block layer didn't
> split a bio that it should have, or it merged two that it shouldn't. To
> determine which, could disable merging for NVMe before running your
> test? Something like this should accomplish that:
> 
>   # echo 1 | tee /sys/block/nvme*/queue/nomerge
> 
> On a side note, I think we should make the BUG_ON a WARN_ON, and return
> an IO error. While it'd fail IO, it should leave the system stable to
> do more prodding.

FWIW I agree that switching the BUG_ON to WARN_ON is a good idea.
Linus has ranted numerous times for BUG_ON'ing when a WARN_ON will do.
http://lkml.iu.edu/hypermail/linux/kernel/1610.0/00878.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* kernel BUG at nvme/host/pci.c
  2017-07-11  7:44   ` Andreas Pflug
@ 2017-07-11 19:45     ` Keith Busch
  2017-07-11 19:44       ` Scott Bauer
  2017-07-12  6:06       ` Andreas Pflug
  0 siblings, 2 replies; 16+ messages in thread
From: Keith Busch @ 2017-07-11 19:45 UTC (permalink / raw)


On Tue, Jul 11, 2017@09:44:47AM +0200, Andreas Pflug wrote:
> Tested with 4.12.0, result is
> 
>   kernel BUG at drivers/nvme/host/pci.c:610!
> 
> Kernel seems to recover from that, but I did a reboot anyway.
 
Ugh, still observing invalid scatter lists on 4.12. Definitely recommend
rebooting after hitting this.

There should only be two possibilities: either the block layer didn't
split a bio that it should have, or it merged two that it shouldn't. To
determine which, could disable merging for NVMe before running your
test? Something like this should accomplish that:

  # echo 1 | tee /sys/block/nvme*/queue/nomerge

On a side note, I think we should make the BUG_ON a WARN_ON, and return
an IO error. While it'd fail IO, it should leave the system stable to
do more prodding.

 
> Jul 11 09:37:28 xen2 [  110.002253] ------------[ cut here ]------------
> Jul 11 09:37:28 xen2 [  110.002310] kernel BUG at drivers/nvme/host/pci.c:610!
> Jul 11 09:37:28 xen2 [  110.002336] invalid opcode: 0000 [#1] SMP
> Jul 11 09:37:28 xen2 [  110.002357] Modules linked in: xt_physdev br_netfilter iptable_filter xen_netback xen_blkback netconsole configfs bridge xen_gntdev xen_evtchn xenfs xen_privcmd dm_snapshot dm_bufio intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 iTCO_wdt crypto_simd iTCO_vendor_support glue_helper mxm_wmi cryptd snd_pcm snd_timer snd soundcore intel_rapl_perf pcspkr ast ttm e1000e drm_kms_helper joydev i2c_i801 ixgbe mei_me nvme drm ehci_pci ptp lpc_ich i2c_algo_bit sg mfd_core ehci_hcd mei pps_core nvme_core mdio ioatdma shpchp dca wmi acpi_power_meter 8021q garp mrp stp llc button ipmi_si ipmi_devintf ipmi_msghandler drbd lru_cache sunrpc ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache raid10 raid456 libcrc32c
> Jul 11 09:37:28 xen2 [  110.002638]  crc32c_generic async_raid6_recov async_memcpy async_pq async_xor xor async_tx evdev hid_generic usbhid hid raid6_pq raid0 multipath linear bcache dm_mod raid1 md_mod sd_mod crc32c_intel ahci libahci xhci_pci xhci_hcd libata usbcore scsi_mod
> Jul 11 09:37:28 xen2 [  110.002746] CPU: 0 PID: 5522 Comm: 2.hda-0 Tainted: G        W       4.12.0pse #2
> Jul 11 09:37:28 xen2 [  110.002775] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 2.1 09/13/2016
> Jul 11 09:37:28 xen2 [  110.002807] task: ffff88015fb3e140 task.stack: ffffc90047b64000
> Jul 11 09:37:28 xen2 [  110.002838] RIP: e030:nvme_queue_rq+0x644/0x7c0 [nvme]
> Jul 11 09:37:28 xen2 [  110.002864] RSP: e02b:ffffc90047b67a10 EFLAGS: 00010286
> Jul 11 09:37:28 xen2 [  110.002889] RAX: 0000000000000008 RBX: 00000000fffff400 RCX: 0000000000001000
> Jul 11 09:37:28 xen2 [  110.002922] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000200
> Jul 11 09:37:28 xen2 [  110.002954] RBP: 0000000000711000 R08: 0000000000001400 R09: ffff880171a82a00
> Jul 11 09:37:28 xen2 [  110.002987] R10: 0000000000001000 R11: ffff880161316d00 R12: 0000000000006000
> Jul 11 09:37:28 xen2 [  110.003019] R13: 0000000000000200 R14: ffff880161316d00 R15: 0000000000000002
> Jul 11 09:37:28 xen2 [  110.003056] FS:  0000000000000000(0000) GS:ffff880186a00000(0000) knlGS:ffff880186a00000
> Jul 11 09:37:28 xen2 [  110.003088] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul 11 09:37:28 xen2 [  110.003115] CR2: 00007fed265e5fe8 CR3: 000000016eec0000 CR4: 0000000000042660
> Jul 11 09:37:28 xen2 [  110.003148] Call Trace:
> Jul 11 09:37:28 xen2 [  110.003169]  ? blk_mq_dispatch_rq_list+0x201/0x400
> Jul 11 09:37:28 xen2 [  110.003193]  ? blk_mq_flush_busy_ctxs+0xc1/0x120
> Jul 11 09:37:28 xen2 [  110.003217]  ? blk_mq_sched_dispatch_requests+0x1b1/0x1e0
> Jul 11 09:37:28 xen2 [  110.003243]  ? __blk_mq_delay_run_hw_queue+0x91/0xa0
> Jul 11 09:37:28 xen2 [  110.003265]  ? blk_mq_flush_plug_list+0x184/0x260
> Jul 11 09:37:28 xen2 [  110.003290]  ? blk_flush_plug_list+0xf2/0x280
> Jul 11 09:37:28 xen2 [  110.003312]  ? blk_finish_plug+0x27/0x40
> Jul 11 09:37:28 xen2 [  110.003335]  ? dispatch_rw_block_io+0x732/0x9c0 [xen_blkback]
> Jul 11 09:37:28 xen2 [  110.003363]  ? __do_block_io_op+0x362/0x690 [xen_blkback]
> Jul 11 09:37:28 xen2 [  110.003393]  ? _raw_spin_unlock_irqrestore+0x16/0x20
> Jul 11 09:37:28 xen2 [  110.003415]  ? __do_block_io_op+0x362/0x690 [xen_blkback]
> Jul 11 09:37:28 xen2 [  110.003442]  ? xen_blkif_schedule+0x116/0x7f0 [xen_blkback]
> Jul 11 09:37:28 xen2 [  110.003469]  ? __schedule+0x3cd/0x850
> Jul 11 09:37:28 xen2 [  110.003488]  ? remove_wait_queue+0x60/0x60
> Jul 11 09:37:28 xen2 [  110.003511]  ? kthread+0xfc/0x130
> Jul 11 09:37:28 xen2 [  110.003530]  ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
> Jul 11 09:37:28 xen2 [  110.003556]  ? kthread_create_on_node+0x70/0x70
> Jul 11 09:37:28 xen2 [  110.003581]  ? do_group_exit+0x3a/0xa0
> Jul 11 09:37:28 xen2 [  110.004573]  ? ret_from_fork+0x25/0x30
> Jul 11 09:37:28 xen2 [  110.005560] Code: ff 4c 89 ef 89 54 24 20 89 4c 24 18 e8 66 e0 e9 c0 8b 54 24 20 48 89 44 24 10 4c 8b 48 10 44 8b 40 18 8b 4c 24 18 e9 74 fd ff ff <0f> 0b 49 8b 77 68 48 8b 3c 24 e8 8d b3 e8 c0 83 e8 01 74 55 41 
> Jul 11 09:37:28 xen2 [  110.007650] RIP: nvme_queue_rq+0x644/0x7c0 [nvme] RSP: ffffc90047b67a10
> Jul 11 09:37:28 xen2 [  110.008708] ---[ end trace ad956c9e07e27784 ]---

^ permalink raw reply	[flat|nested] 16+ messages in thread

* kernel BUG at nvme/host/pci.c
  2017-07-11 19:45     ` Keith Busch
  2017-07-11 19:44       ` Scott Bauer
@ 2017-07-12  6:06       ` Andreas Pflug
  2017-07-12 19:50         ` Keith Busch
  1 sibling, 1 reply; 16+ messages in thread
From: Andreas Pflug @ 2017-07-12  6:06 UTC (permalink / raw)


Am 11.07.17 um 21:45 schrieb Keith Busch:
> On Tue, Jul 11, 2017@09:44:47AM +0200, Andreas Pflug wrote:
>> Tested with 4.12.0, result is
>>
>>   kernel BUG at drivers/nvme/host/pci.c:610!
>>
>> Kernel seems to recover from that, but I did a reboot anyway.
>  
> Ugh, still observing invalid scatter lists on 4.12. Definitely recommend
> rebooting after hitting this.
>
> There should only be two possibilities: either the block layer didn't
> split a bio that it should have, or it merged two that it shouldn't. To
> determine which, could disable merging for NVMe before running your
> test? Something like this should accomplish that:
>
>   # echo 1 | tee /sys/block/nvme*/queue/nomerge
nomerges set to 1 on both devices, same BUG_ON.

Regards,
Andreas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* kernel BUG at nvme/host/pci.c
  2017-07-12  6:06       ` Andreas Pflug
@ 2017-07-12 19:50         ` Keith Busch
  2017-07-13  8:46           ` Andreas Pflug
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2017-07-12 19:50 UTC (permalink / raw)


On Wed, Jul 12, 2017@08:06:29AM +0200, Andreas Pflug wrote:
> nomerges set to 1 on both devices, same BUG_ON.

Thanks for the info.

Could you possibly recreate with the patch below? This will simply
return IO error rather the panic, and show exactly how this invalid SGL
is constructed.

The block layer is considering all the cases I can think of that might
break NVMe, so these details should help explain how we got here.

I'll send this as a proper patch for upstream consideration as well.

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c4343c4..8cb3e89 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -533,7 +533,7 @@ static void nvme_dif_complete(u32 p, u32 v, struct t10_pi_tuple *pi)
 }
 #endif
 
-static bool nvme_setup_prps(struct nvme_dev *dev, struct request *req)
+static blk_status_t nvme_setup_prps(struct nvme_dev *dev, struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	struct dma_pool *pool;
@@ -550,7 +550,7 @@ static bool nvme_setup_prps(struct nvme_dev *dev, struct request *req)
 
 	length -= (page_size - offset);
 	if (length <= 0)
-		return true;
+		return BLK_STS_OK;
 
 	dma_len -= (page_size - offset);
 	if (dma_len) {
@@ -563,7 +563,7 @@ static bool nvme_setup_prps(struct nvme_dev *dev, struct request *req)
 
 	if (length <= page_size) {
 		iod->first_dma = dma_addr;
-		return true;
+		return BLK_STS_OK;
 	}
 
 	nprps = DIV_ROUND_UP(length, page_size);
@@ -579,7 +579,7 @@ static bool nvme_setup_prps(struct nvme_dev *dev, struct request *req)
 	if (!prp_list) {
 		iod->first_dma = dma_addr;
 		iod->npages = -1;
-		return false;
+		return BLK_STS_RESOURCE;
 	}
 	list[0] = prp_list;
 	iod->first_dma = prp_dma;
@@ -589,7 +589,7 @@ static bool nvme_setup_prps(struct nvme_dev *dev, struct request *req)
 			__le64 *old_prp_list = prp_list;
 			prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
 			if (!prp_list)
-				return false;
+				return BLK_STS_RESOURCE;
 			list[iod->npages++] = prp_list;
 			prp_list[0] = old_prp_list[i - 1];
 			old_prp_list[i - 1] = cpu_to_le64(prp_dma);
@@ -603,13 +603,29 @@ static bool nvme_setup_prps(struct nvme_dev *dev, struct request *req)
 			break;
 		if (dma_len > 0)
 			continue;
-		BUG_ON(dma_len < 0);
+		if (unlikely(dma_len < 0))
+			goto bad_sgl;
 		sg = sg_next(sg);
 		dma_addr = sg_dma_address(sg);
 		dma_len = sg_dma_len(sg);
 	}
 
-	return true;
+	return BLK_STS_OK;
+
+ bad_sgl:
+	if (WARN_ONCE(1, "Invalid SGL for payload:%d nents:%d\n",
+				blk_rq_payload_bytes(req), iod->nents)) {
+		for_each_sg(iod->sg, sg, iod->nents, i) {
+			dma_addr_t phys = sg_phys(sg);
+			printk("sg[%d] phys_addr:%pad offset:%d length:%d "
+			       "dma_address:%pad dma_length:%d\n", i, &phys,
+					sg->offset, sg->length,
+					&sg_dma_address(sg),
+					sg_dma_len(sg));
+		}
+	}
+	return BLK_STS_IOERR;
+
 }
 
 static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
@@ -631,7 +647,8 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 				DMA_ATTR_NO_WARN))
 		goto out;
 
-	if (!nvme_setup_prps(dev, req))
+	ret = nvme_setup_prps(dev, req);
+	if (ret != BLK_STS_OK)
 		goto out_unmap;
 
 	ret = BLK_STS_IOERR;
--

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* kernel BUG at nvme/host/pci.c
  2017-07-12 19:50         ` Keith Busch
@ 2017-07-13  8:46           ` Andreas Pflug
  2017-07-13  9:00             ` Sagi Grimberg
  2017-07-13 13:47             ` Keith Busch
  0 siblings, 2 replies; 16+ messages in thread
From: Andreas Pflug @ 2017-07-13  8:46 UTC (permalink / raw)


Am 12.07.17 um 21:50 schrieb Keith Busch:
> On Wed, Jul 12, 2017@08:06:29AM +0200, Andreas Pflug wrote:
>> nomerges set to 1 on both devices, same BUG_ON.
> Thanks for the info.
>
> Could you possibly recreate with the patch below? This will simply
> return IO error rather the panic, and show exactly how this invalid SGL
> is constructed.
Won't compile with 4.12.0, since BLK_STS_* and blk_status_t aren't present.
Got the latest sources from git, applied the patch and earned "Invalid
SGL for payload:36864 nents:7". System is badly yelling about i/o errors
on NVME, so I rebooted.

Log attached.

Regards,
Andreas
-------------- next part --------------
Jul 13 10:37:37 xen2 [  202.688278] Invalid SGL for payload:36864 nents:7
Jul 13 10:37:37 xen2 [  202.688342] ------------[ cut here ]------------
Jul 13 10:37:37 xen2 [  202.688374] WARNING: CPU: 0 PID: 6970 at drivers/nvme/host/pci.c:623 nvme_queue_rq+0x81b/0x840 [nvme]
Jul 13 10:37:37 xen2 [  202.688413] Modules linked in: xt_physdev br_netfilter iptable_filter xen_netback xen_blkback netconsole configfs bridge xen_gntdev xen_evtchn xenfs xen_privcmd intel_rapl iTCO_wdt iTCO_vendor_support mxm_wmi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_rapl_perf snd_pcm snd_timer snd soundcore pcspkr i2c_i801 ast ttm drm_kms_helper sg joydev drm i2c_algo_bit lpc_ich mfd_core ehci_pci ehci_hcd mei_me mei e1000e ixgbe ptp nvme pps_core nvme_core mdio ioatdma shpchp dca wmi acpi_power_meter 8021q garp mrp stp llc button ipmi_si ipmi_devintf ipmi_msghandler drbd lru_cache sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto raid10 raid456 libcrc32c crc32c_generic async_raid6_recov
Jul 13 10:37:37 xen2 [  202.688695]  async_memcpy async_pq async_xor xor async_tx evdev hid_generic usbhid hid raid6_pq raid0 multipath linear bcache dm_mod raid1 md_mod sd_mod crc32c_intel ahci libahci xhci_pci xhci_hcd libata usbcore scsi_mod
Jul 13 10:37:37 xen2 [  202.688780] CPU: 0 PID: 6970 Comm: 2.hda-0 Tainted: G        W       4.12.0-20170713+ #1
Jul 13 10:37:37 xen2 [  202.688817] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 2.1 09/13/2016
Jul 13 10:37:37 xen2 [  202.688850] task: ffff880179ef5080 task.stack: ffffc9004874c000
Jul 13 10:37:37 xen2 [  202.688876] RIP: e030:nvme_queue_rq+0x81b/0x840 [nvme]
Jul 13 10:37:37 xen2 [  202.688899] RSP: e02b:ffffc9004874fa00 EFLAGS: 00010286
Jul 13 10:37:37 xen2 [  202.688925] RAX: 0000000000000025 RBX: 00000000fffff400 RCX: 0000000000000000
Jul 13 10:37:37 xen2 [  202.688954] RDX: 0000000000000000 RSI: ffff880186a0de98 RDI: ffff880186a0de98
Jul 13 10:37:37 xen2 [  202.688988] RBP: ffff88017b50f000 R08: 0000000000000001 R09: 00000000000009e7
Jul 13 10:37:37 xen2 [  202.689021] R10: 0000000000001000 R11: 0000000000000001 R12: 0000000000000200
Jul 13 10:37:37 xen2 [  202.689053] R13: 0000000000001000 R14: ffff880160134600 R15: ffff880170d91800
Jul 13 10:37:37 xen2 [  202.689091] FS:  0000000000000000(0000) GS:ffff880186a00000(0000) knlGS:ffff880186a00000
Jul 13 10:37:37 xen2 [  202.689128] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 13 10:37:37 xen2 [  202.689155] CR2: 00007f91740959a8 CR3: 0000000161716000 CR4: 0000000000042660
Jul 13 10:37:37 xen2 [  202.689189] Call Trace:
Jul 13 10:37:37 xen2 [  202.689210]  ? __sbitmap_get_word+0x2a/0x80
Jul 13 10:37:37 xen2 [  202.689235]  ? blk_mq_dispatch_rq_list+0x200/0x3d0
Jul 13 10:37:37 xen2 [  202.689257]  ? blk_mq_flush_busy_ctxs+0xd1/0x120
Jul 13 10:37:37 xen2 [  202.689279]  ? blk_mq_sched_dispatch_requests+0x1c0/0x1f0
Jul 13 10:37:37 xen2 [  202.689306]  ? __blk_mq_delay_run_hw_queue+0x8f/0xa0
Jul 13 10:37:37 xen2 [  202.689328]  ? blk_mq_flush_plug_list+0x184/0x260
Jul 13 10:37:37 xen2 [  202.689353]  ? blk_flush_plug_list+0xf2/0x280
Jul 13 10:37:37 xen2 [  202.689376]  ? blk_finish_plug+0x27/0x40
Jul 13 10:37:37 xen2 [  202.689400]  ? dispatch_rw_block_io+0x732/0x9c0 [xen_blkback]
Jul 13 10:37:37 xen2 [  202.690364]  ? __do_block_io_op+0x362/0x690 [xen_blkback]
Jul 13 10:37:37 xen2 [  202.691408]  ? _raw_spin_unlock_irqrestore+0x16/0x20
Jul 13 10:37:37 xen2 [  202.692440]  ? __do_block_io_op+0x362/0x690 [xen_blkback]
Jul 13 10:37:37 xen2 [  202.693420]  ? xen_blkif_schedule+0x116/0x7f0 [xen_blkback]
Jul 13 10:37:37 xen2 [  202.694361]  ? __schedule+0x3cd/0x850
Jul 13 10:37:37 xen2 [  202.695410]  ? remove_wait_queue+0x60/0x60
Jul 13 10:37:37 xen2 [  202.696432]  ? kthread+0xfc/0x130
Jul 13 10:37:37 xen2 [  202.697377]  ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
Jul 13 10:37:37 xen2 [  202.698290]  ? kthread_create_on_node+0x70/0x70
Jul 13 10:37:37 xen2 [  202.699293]  ? do_group_exit+0x3a/0xa0
Jul 13 10:37:37 xen2 [  202.700206]  ? ret_from_fork+0x25/0x30
Jul 13 10:37:37 xen2 [  202.701184] Code: f9 ff ff 41 f6 47 4a 04 c6 05 7a 3e 00 00 01 41 8b 97 70 01 00 00 74 28 41 8b b7 90 00 00 00 48 c7 c7 b8 17 54 c0 e8 40 14 b9 c0 <0f> ff e9 4d fe ff ff 0f 0b 4c 8b 2d c5 05 6e c1 e9 53 ff ff ff 
Jul 13 10:37:37 xen2 [  202.703231] ---[ end trace 5b778353298dbe78 ]---
Jul 13 10:37:37 xen2 [  202.704217] sg[0] phys_addr:0x0000000aff50ec00 offset:3072 length:9216 dma_address:0x000000000070f000 dma_length:9216
Jul 13 10:37:37 xen2 [  202.705197] sg[1] phys_addr:0x0000000aff511000 offset:0 length:4096 dma_address:0x00000008755a1000 dma_length:4096
Jul 13 10:37:37 xen2 [  202.706275] sg[2] phys_addr:0x0000000aff5ef000 offset:0 length:8192 dma_address:0x0000000000712000 dma_length:8192
Jul 13 10:37:37 xen2 [  202.707315] sg[3] phys_addr:0x0000000aff564000 offset:0 length:4096 dma_address:0x0000000874fc0000 dma_length:4096
Jul 13 10:37:37 xen2 [  202.708202] sg[4] phys_addr:0x0000000aff5a7000 offset:0 length:4096 dma_address:0x0000000874fc0000 dma_length:4096
Jul 13 10:37:37 xen2 [  202.709030] sg[5] phys_addr:0x0000000aff5a6000 offset:0 length:4096 dma_address:0x0000000874fc0000 dma_length:4096
Jul 13 10:37:37 xen2 [  202.709960] sg[6] phys_addr:0x0000000aff5a5000 offset:0 length:3072 dma_address:0x0000000874fc0000 dma_length:3072
Jul 13 10:37:37 xen2 [  202.710755] print_req_error: I/O error, dev nvme0n1, sector 1188548943
Jul 13 10:37:37 xen2 [  202.711527] md/raid1:md1: nvme0n1p1: rescheduling sector 1188284751
Jul 13 10:37:37 xen2 [  202.712926] sg[0] phys_addr:0x0000000aff50ec00 offset:3072 length:9216 dma_address:0x0000000000716000 dma_length:9216
Jul 13 10:37:37 xen2 [  202.712928] sg[0] phys_addr:0x0000000aff559c00 offset:3072 length:17408 dma_address:0x000000000071b000 dma_length:17408
Jul 13 10:37:37 xen2 [  202.712931] sg[1] phys_addr:0x0000000aff5f5000 offset:0 length:4096 dma_address:0x0000000874fc0000 dma_length:4096
Jul 13 10:37:37 xen2 [  202.712932] sg[2] phys_addr:0x0000000aff586000 offset:0 length:4096 dma_address:0x0000000874fc0000 dma_length:4096

^ permalink raw reply	[flat|nested] 16+ messages in thread

* kernel BUG at nvme/host/pci.c
  2017-07-13  8:46           ` Andreas Pflug
@ 2017-07-13  9:00             ` Sagi Grimberg
  2017-07-13 13:47             ` Keith Busch
  1 sibling, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2017-07-13  9:00 UTC (permalink / raw)


>>> nomerges set to 1 on both devices, same BUG_ON.
>> Thanks for the info.
>>
>> Could you possibly recreate with the patch below? This will simply
>> return IO error rather the panic, and show exactly how this invalid SGL
>> is constructed.
> Won't compile with 4.12.0, since BLK_STS_* and blk_status_t aren't present.
> Got the latest sources from git, applied the patch and earned "Invalid
> SGL for payload:36864 nents:7". System is badly yelling about i/o errors
> on NVME, so I rebooted.

I think if we iterated on the sgls logging their dma_address, offset and
length it'd be more useful.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* kernel BUG at nvme/host/pci.c
  2017-07-13  8:46           ` Andreas Pflug
  2017-07-13  9:00             ` Sagi Grimberg
@ 2017-07-13 13:47             ` Keith Busch
  2017-07-14 16:47               ` Andreas Pflug
  1 sibling, 1 reply; 16+ messages in thread
From: Keith Busch @ 2017-07-13 13:47 UTC (permalink / raw)


On Thu, Jul 13, 2017@10:46:27AM +0200, Andreas Pflug wrote:
> Am 12.07.17 um 21:50 schrieb Keith Busch:
> Won't compile with 4.12.0, since BLK_STS_* and blk_status_t aren't present.
> Got the latest sources from git, applied the patch and earned "Invalid
> SGL for payload:36864 nents:7". System is badly yelling about i/o errors
> on NVME, so I rebooted.

Thanks for getting this. Exactly what we needed, and IO errors would be
expected in your scenario.

> Jul 13 10:37:37 xen2 [  202.688278] Invalid SGL for payload:36864 nents:7
> Jul 13 10:37:37 xen2 [  202.688342] ------------[ cut here ]------------

<snip>

> Jul 13 10:37:37 xen2 [  202.703231] ---[ end trace 5b778353298dbe78 ]---
> Jul 13 10:37:37 xen2 [  202.704217] sg[0] phys_addr:0x0000000aff50ec00 offset:3072 length:9216 dma_address:0x000000000070f000 dma_length:9216
> Jul 13 10:37:37 xen2 [  202.705197] sg[1] phys_addr:0x0000000aff511000 offset:0 length:4096 dma_address:0x00000008755a1000 dma_length:4096
> Jul 13 10:37:37 xen2 [  202.706275] sg[2] phys_addr:0x0000000aff5ef000 offset:0 length:8192 dma_address:0x0000000000712000 dma_length:8192
> Jul 13 10:37:37 xen2 [  202.707315] sg[3] phys_addr:0x0000000aff564000 offset:0 length:4096 dma_address:0x0000000874fc0000 dma_length:4096
> Jul 13 10:37:37 xen2 [  202.708202] sg[4] phys_addr:0x0000000aff5a7000 offset:0 length:4096 dma_address:0x0000000874fc0000 dma_length:4096
> Jul 13 10:37:37 xen2 [  202.709030] sg[5] phys_addr:0x0000000aff5a6000 offset:0 length:4096 dma_address:0x0000000874fc0000 dma_length:4096
> Jul 13 10:37:37 xen2 [  202.709960] sg[6] phys_addr:0x0000000aff5a5000 offset:0 length:3072 dma_address:0x0000000874fc0000 dma_length:3072
> Jul 13 10:37:37 xen2 [  202.710755] print_req_error: I/O error, dev nvme0n1, sector 1188548943
> Jul 13 10:37:37 xen2 [  202.711527] md/raid1:md1: nvme0n1p1: rescheduling sector 1188284751

The first SGL has phys addr aff50ec00, which is a page offset of 3072,
but the dma addr is 70f000, which is a 0 offset. Since DMA page offset
doesn't match the physical address', this isn't compatible with the
nvme implementation.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* kernel BUG at nvme/host/pci.c
  2017-07-13 13:47             ` Keith Busch
@ 2017-07-14 16:47               ` Andreas Pflug
  2017-07-14 17:08                 ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Pflug @ 2017-07-14 16:47 UTC (permalink / raw)


Am 13.07.17 um 15:47 schrieb Keith Busch:
>
>> Jul 13 10:37:37 xen2 [  202.703231] ---[ end trace 5b778353298dbe78 ]---
>> Jul 13 10:37:37 xen2 [  202.704217] sg[0] phys_addr:0x0000000aff50ec00 offset:3072 length:9216 dma_address:0x000000000070f000 dma_length:9216
>> Jul 13 10:37:37 xen2 [  202.705197] sg[1] phys_addr:0x0000000aff511000 offset:0 length:4096 dma_address:0x00000008755a1000 dma_length:4096
>> Jul 13 10:37:37 xen2 [  202.706275] sg[2] phys_addr:0x0000000aff5ef000 offset:0 length:8192 dma_address:0x0000000000712000 dma_length:8192
>> Jul 13 10:37:37 xen2 [  202.707315] sg[3] phys_addr:0x0000000aff564000 offset:0 length:4096 dma_address:0x0000000874fc0000 dma_length:4096
>> Jul 13 10:37:37 xen2 [  202.708202] sg[4] phys_addr:0x0000000aff5a7000 offset:0 length:4096 dma_address:0x0000000874fc0000 dma_length:4096
>> Jul 13 10:37:37 xen2 [  202.709030] sg[5] phys_addr:0x0000000aff5a6000 offset:0 length:4096 dma_address:0x0000000874fc0000 dma_length:4096
>> Jul 13 10:37:37 xen2 [  202.709960] sg[6] phys_addr:0x0000000aff5a5000 offset:0 length:3072 dma_address:0x0000000874fc0000 dma_length:3072
>> Jul 13 10:37:37 xen2 [  202.710755] print_req_error: I/O error, dev nvme0n1, sector 1188548943
>> Jul 13 10:37:37 xen2 [  202.711527] md/raid1:md1: nvme0n1p1: rescheduling sector 1188284751
> The first SGL has phys addr aff50ec00, which is a page offset of 3072,
> but the dma addr is 70f000, which is a 0 offset. Since DMA page offset
> doesn't match the physical address', this isn't compatible with the
> nvme implementation.
So LVM2 backed by md raid1 isn't compatible with newer hardware... Any
suggestions?

Regards,
Andreas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* kernel BUG at nvme/host/pci.c
  2017-07-14 16:47               ` Andreas Pflug
@ 2017-07-14 17:08                 ` Keith Busch
  2017-07-15  8:51                     ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2017-07-14 17:08 UTC (permalink / raw)


On Fri, Jul 14, 2017@06:47:43PM +0200, Andreas Pflug wrote:
> Am 13.07.17 um 15:47 schrieb Keith Busch:
> >
> >> Jul 13 10:37:37 xen2 [  202.703231] ---[ end trace 5b778353298dbe78 ]---
> >> Jul 13 10:37:37 xen2 [  202.704217] sg[0] phys_addr:0x0000000aff50ec00 offset:3072 length:9216 dma_address:0x000000000070f000 dma_length:9216
> >> Jul 13 10:37:37 xen2 [  202.705197] sg[1] phys_addr:0x0000000aff511000 offset:0 length:4096 dma_address:0x00000008755a1000 dma_length:4096
> >> Jul 13 10:37:37 xen2 [  202.706275] sg[2] phys_addr:0x0000000aff5ef000 offset:0 length:8192 dma_address:0x0000000000712000 dma_length:8192
> >> Jul 13 10:37:37 xen2 [  202.707315] sg[3] phys_addr:0x0000000aff564000 offset:0 length:4096 dma_address:0x0000000874fc0000 dma_length:4096
> >> Jul 13 10:37:37 xen2 [  202.708202] sg[4] phys_addr:0x0000000aff5a7000 offset:0 length:4096 dma_address:0x0000000874fc0000 dma_length:4096
> >> Jul 13 10:37:37 xen2 [  202.709030] sg[5] phys_addr:0x0000000aff5a6000 offset:0 length:4096 dma_address:0x0000000874fc0000 dma_length:4096
> >> Jul 13 10:37:37 xen2 [  202.709960] sg[6] phys_addr:0x0000000aff5a5000 offset:0 length:3072 dma_address:0x0000000874fc0000 dma_length:3072
> >> Jul 13 10:37:37 xen2 [  202.710755] print_req_error: I/O error, dev nvme0n1, sector 1188548943
> >> Jul 13 10:37:37 xen2 [  202.711527] md/raid1:md1: nvme0n1p1: rescheduling sector 1188284751
> > The first SGL has phys addr aff50ec00, which is a page offset of 3072,
> > but the dma addr is 70f000, which is a 0 offset. Since DMA page offset
> > doesn't match the physical address', this isn't compatible with the
> > nvme implementation.
>
> So LVM2 backed by md raid1 isn't compatible with newer hardware... Any
> suggestions?

It's not that LVM2 or RAID isn't compatible. Either the IOMMU isn't
compatible if can use different page offsets for DMA addresses than the
physical aaddresses, or the driver for it is broken. The DMA addresses
in this mapped SGL look completely broken, at least, since the last 4
entries are all the same address. That'll corrupt data.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* kernel BUG at nvme/host/pci.c
  2017-07-14 17:08                 ` Keith Busch
@ 2017-07-15  8:51                     ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2017-07-15  8:51 UTC (permalink / raw)


On Fri, Jul 14, 2017@01:08:47PM -0400, Keith Busch wrote:
> > So LVM2 backed by md raid1 isn't compatible with newer hardware... Any
> > suggestions?
> 
> It's not that LVM2 or RAID isn't compatible. Either the IOMMU isn't
> compatible if can use different page offsets for DMA addresses than the
> physical aaddresses, or the driver for it is broken. The DMA addresses
> in this mapped SGL look completely broken, at least, since the last 4
> entries are all the same address. That'll corrupt data.

Given that this is a Xen system I wonder if swiotlb-xen is involved
here, which does some odd chunking of dma translations?

> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
---end quoted text---

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at nvme/host/pci.c
@ 2017-07-15  8:51                     ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2017-07-15  8:51 UTC (permalink / raw)
  To: Keith Busch; +Cc: xen-devel, Andreas Pflug, linux-nvme

On Fri, Jul 14, 2017 at 01:08:47PM -0400, Keith Busch wrote:
> > So LVM2 backed by md raid1 isn't compatible with newer hardware... Any
> > suggestions?
> 
> It's not that LVM2 or RAID isn't compatible. Either the IOMMU isn't
> compatible if can use different page offsets for DMA addresses than the
> physical aaddresses, or the driver for it is broken. The DMA addresses
> in this mapped SGL look completely broken, at least, since the last 4
> entries are all the same address. That'll corrupt data.

Given that this is a Xen system I wonder if swiotlb-xen is involved
here, which does some odd chunking of dma translations?

> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
---end quoted text---

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* kernel BUG at nvme/host/pci.c
  2017-07-15  8:51                     ` Christoph Hellwig
@ 2017-07-15 13:34                       ` Andreas Pflug
  -1 siblings, 0 replies; 16+ messages in thread
From: Andreas Pflug @ 2017-07-15 13:34 UTC (permalink / raw)


Am 15.07.17 um 10:51 schrieb Christoph Hellwig:
> On Fri, Jul 14, 2017@01:08:47PM -0400, Keith Busch wrote:
>>> So LVM2 backed by md raid1 isn't compatible with newer hardware... Any
>>> suggestions?
>> It's not that LVM2 or RAID isn't compatible. Either the IOMMU isn't
>> compatible if can use different page offsets for DMA addresses than the
>> physical aaddresses, or the driver for it is broken. The DMA addresses
>> in this mapped SGL look completely broken, at least, since the last 4
>> entries are all the same address. That'll corrupt data.
> Given that this is a Xen system I wonder if swiotlb-xen is involved
> here, which does some odd chunking of dma translations?

I did some more testing now.

With data stored on SATA disks with md1 and lvm2 (i.e. just replacing
NVME by SATA), there's nothing happening.
With data stored on /dev/nvme1n1p1, i.e. without any device mapping
stuff, I get the same problem.
Log attached.

Regards,
Andreas
-------------- next part --------------
Jul 15 15:25:06 xen2 [ 4376.149215] Invalid SGL for payload:20992 nents:5
Jul 15 15:25:06 xen2 [ 4376.150382] ------------[ cut here ]------------
Jul 15 15:25:06 xen2 [ 4376.151261] WARNING: CPU: 0 PID: 29095 at drivers/nvme/host/pci.c:623 nvme_queue_rq+0x81b/0x840 [nvme]
Jul 15 15:25:06 xen2 [ 4376.152194] Modules linked in: xt_physdev br_netfilter iptable_filter xen_netback xen_blkback netconsole configfs bridge xen_gntdev xen_evtchn xenfs xen_privcmd iTCO_wdt intel_rapl iTCO_vendor_support mxm_wmi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_rapl_perf snd_pcm snd_timer snd soundcore pcspkr i2c_i801 joydev ast ttm drm_kms_helper drm sg i2c_algo_bit lpc_ich ehci_pci mfd_core ehci_hcd mei_me mei e1000e ixgbe ptp nvme pps_core mdio nvme_core ioatdma shpchp dca wmi acpi_power_meter 8021q garp mrp stp llc button ipmi_si ipmi_devintf ipmi_msghandler sunrpc drbd lru_cache ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto raid10 raid456 libcrc32c crc32c_generic async_raid6_recov
Jul 15 15:25:06 xen2 [ 4376.158582]  async_memcpy async_pq async_xor xor async_tx raid6_pq raid0 multipath linear evdev hid_generic usbhid hid bcache dm_mod raid1 md_mod sd_mod crc32c_intel ahci libahci xhci_pci xhci_hcd libata usbcore scsi_mod
Jul 15 15:25:06 xen2 [ 4376.160593] CPU: 0 PID: 29095 Comm: 8.hda-0 Tainted: G      D W       4.12.0-20170713+ #1
Jul 15 15:25:06 xen2 [ 4376.161678] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 2.1 09/13/2016
Jul 15 15:25:06 xen2 [ 4376.162649] task: ffff88015fdc5000 task.stack: ffffc90048134000
Jul 15 15:25:06 xen2 [ 4376.163676] RIP: e030:nvme_queue_rq+0x81b/0x840 [nvme]
Jul 15 15:25:06 xen2 [ 4376.164804] RSP: e02b:ffffc90048137a00 EFLAGS: 00010286
Jul 15 15:25:06 xen2 [ 4376.165890] RAX: 0000000000000025 RBX: 00000000fffff200 RCX: 0000000000000000
Jul 15 15:25:06 xen2 [ 4376.166982] RDX: 0000000000000000 RSI: ffff880186a0de98 RDI: ffff880186a0de98
Jul 15 15:25:06 xen2 [ 4376.168099] RBP: ffff8801732ff000 R08: 0000000000000001 R09: 0000000000000a57
Jul 15 15:25:06 xen2 [ 4376.169081] R10: 0000000000001000 R11: 0000000000000001 R12: 0000000000000200
Jul 15 15:25:06 xen2 [ 4376.170198] R13: 0000000000001000 R14: ffff88015f9d7800 R15: ffff88016fce1800
Jul 15 15:25:06 xen2 [ 4376.171330] FS:  0000000000000000(0000) GS:ffff880186a00000(0000) knlGS:ffff880186a00000
Jul 15 15:25:06 xen2 [ 4376.172474] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 15 15:25:06 xen2 [ 4376.173600] CR2: 000000b0f98d1970 CR3: 0000000175d4f000 CR4: 0000000000042660
Jul 15 15:25:06 xen2 [ 4376.174643] Call Trace:
Jul 15 15:25:06 xen2 [ 4376.175743]  ? __sbitmap_get_word+0x2a/0x80
Jul 15 15:25:06 xen2 [ 4376.176814]  ? blk_mq_dispatch_rq_list+0x200/0x3d0
Jul 15 15:25:06 xen2 [ 4376.177932]  ? blk_mq_flush_busy_ctxs+0xd1/0x120
Jul 15 15:25:06 xen2 [ 4376.178961]  ? blk_mq_sched_dispatch_requests+0x1c0/0x1f0
Jul 15 15:25:06 xen2 [ 4376.179942]  ? __blk_mq_delay_run_hw_queue+0x8f/0xa0
Jul 15 15:25:06 xen2 [ 4376.180941]  ? blk_mq_flush_plug_list+0x184/0x260
Jul 15 15:25:06 xen2 [ 4376.181935]  ? blk_flush_plug_list+0xf2/0x280
Jul 15 15:25:06 xen2 [ 4376.182952]  ? blk_finish_plug+0x27/0x40
Jul 15 15:25:06 xen2 [ 4376.183985]  ? dispatch_rw_block_io+0x732/0x9c0 [xen_blkback]
Jul 15 15:25:06 xen2 [ 4376.185059]  ? _raw_spin_lock_irqsave+0x17/0x39
Jul 15 15:25:06 xen2 [ 4376.186103]  ? __do_block_io_op+0x362/0x690 [xen_blkback]
Jul 15 15:25:06 xen2 [ 4376.187167]  ? _raw_spin_unlock_irqrestore+0x16/0x20
Jul 15 15:25:06 xen2 [ 4376.188216]  ? __do_block_io_op+0x362/0x690 [xen_blkback]
Jul 15 15:25:06 xen2 [ 4376.189294]  ? xen_blkif_schedule+0x116/0x7f0 [xen_blkback]
Jul 15 15:25:06 xen2 [ 4376.190247]  ? __schedule+0x3cd/0x850
Jul 15 15:25:06 xen2 [ 4376.191152]  ? remove_wait_queue+0x60/0x60
Jul 15 15:25:06 xen2 [ 4376.192112]  ? kthread+0xfc/0x130
Jul 15 15:25:06 xen2 [ 4376.193169]  ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
Jul 15 15:25:06 xen2 [ 4376.194105]  ? kthread_create_on_node+0x70/0x70
Jul 15 15:25:06 xen2 [ 4376.195059]  ? do_group_exit+0x3a/0xa0
Jul 15 15:25:06 xen2 [ 4376.196049]  ? ret_from_fork+0x25/0x30
Jul 15 15:25:06 xen2 [ 4376.197050] Code: f9 ff ff 41 f6 47 4a 04 c6 05 7a 3e 00 00 01 41 8b 97 70 01 00 00 74 28 41 8b b7 90 00 00 00 48 c7 c7 b8 87 48 c0 e8 40 a4 c4 c0 <0f> ff e9 4d fe ff ff 0f 0b 4c 8b 2d c5 95 79 c1 e9 53 ff ff ff 
Jul 15 15:25:06 xen2 [ 4376.198947] ---[ end trace 6d7d395a29c931b5 ]---
Jul 15 15:25:06 xen2 [ 4376.200012] sg[0] phys_addr:0x0000000aff549e00 offset:3584 length:4608 dma_address:0x00000000004a3000 dma_length:4608
Jul 15 15:25:06 xen2 [ 4376.200951] sg[1] phys_addr:0x0000000aff5c3000 offset:0 length:4096 dma_address:0x00000009f4a80000 dma_length:4096
Jul 15 15:25:06 xen2 [ 4376.202015] sg[2] phys_addr:0x0000000aff615000 offset:0 length:4096 dma_address:0x00000009f4a80000 dma_length:4096
Jul 15 15:25:06 xen2 [ 4376.203006] sg[3] phys_addr:0x0000000aff608000 offset:0 length:4096 dma_address:0x00000009f4a80000 dma_length:4096
Jul 15 15:25:06 xen2 [ 4376.203889] sg[4] phys_addr:0x0000000aff50e000 offset:0 length:4096 dma_address:0x00000009f5a4e000 dma_length:4096
Jul 15 15:25:06 xen2 [ 4376.204722] print_req_error: I/O error, dev nvme1n1, sector 14318951

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at nvme/host/pci.c
@ 2017-07-15 13:34                       ` Andreas Pflug
  0 siblings, 0 replies; 16+ messages in thread
From: Andreas Pflug @ 2017-07-15 13:34 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch; +Cc: xen-devel, linux-nvme

[-- Attachment #1: Type: text/plain, Size: 957 bytes --]

Am 15.07.17 um 10:51 schrieb Christoph Hellwig:
> On Fri, Jul 14, 2017 at 01:08:47PM -0400, Keith Busch wrote:
>>> So LVM2 backed by md raid1 isn't compatible with newer hardware... Any
>>> suggestions?
>> It's not that LVM2 or RAID isn't compatible. Either the IOMMU isn't
>> compatible if can use different page offsets for DMA addresses than the
>> physical aaddresses, or the driver for it is broken. The DMA addresses
>> in this mapped SGL look completely broken, at least, since the last 4
>> entries are all the same address. That'll corrupt data.
> Given that this is a Xen system I wonder if swiotlb-xen is involved
> here, which does some odd chunking of dma translations?

I did some more testing now.

With data stored on SATA disks with md1 and lvm2 (i.e. just replacing
NVME by SATA), there's nothing happening.
With data stored on /dev/nvme1n1p1, i.e. without any device mapping
stuff, I get the same problem.
Log attached.

Regards,
Andreas

[-- Attachment #2: x.log --]
[-- Type: text/plain, Size: 5313 bytes --]

Jul 15 15:25:06 xen2 [ 4376.149215] Invalid SGL for payload:20992 nents:5
Jul 15 15:25:06 xen2 [ 4376.150382] ------------[ cut here ]------------
Jul 15 15:25:06 xen2 [ 4376.151261] WARNING: CPU: 0 PID: 29095 at drivers/nvme/host/pci.c:623 nvme_queue_rq+0x81b/0x840 [nvme]
Jul 15 15:25:06 xen2 [ 4376.152194] Modules linked in: xt_physdev br_netfilter iptable_filter xen_netback xen_blkback netconsole configfs bridge xen_gntdev xen_evtchn xenfs xen_privcmd iTCO_wdt intel_rapl iTCO_vendor_support mxm_wmi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_rapl_perf snd_pcm snd_timer snd soundcore pcspkr i2c_i801 joydev ast ttm drm_kms_helper drm sg i2c_algo_bit lpc_ich ehci_pci mfd_core ehci_hcd mei_me mei e1000e ixgbe ptp nvme pps_core mdio nvme_core ioatdma shpchp dca wmi acpi_power_meter 8021q garp mrp stp llc button ipmi_si ipmi_devintf ipmi_msghandler sunrpc drbd lru_cache ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto raid10 raid456 libcrc32c crc32c_generic async_raid6_recov
Jul 15 15:25:06 xen2 [ 4376.158582]  async_memcpy async_pq async_xor xor async_tx raid6_pq raid0 multipath linear evdev hid_generic usbhid hid bcache dm_mod raid1 md_mod sd_mod crc32c_intel ahci libahci xhci_pci xhci_hcd libata usbcore scsi_mod
Jul 15 15:25:06 xen2 [ 4376.160593] CPU: 0 PID: 29095 Comm: 8.hda-0 Tainted: G      D W       4.12.0-20170713+ #1
Jul 15 15:25:06 xen2 [ 4376.161678] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 2.1 09/13/2016
Jul 15 15:25:06 xen2 [ 4376.162649] task: ffff88015fdc5000 task.stack: ffffc90048134000
Jul 15 15:25:06 xen2 [ 4376.163676] RIP: e030:nvme_queue_rq+0x81b/0x840 [nvme]
Jul 15 15:25:06 xen2 [ 4376.164804] RSP: e02b:ffffc90048137a00 EFLAGS: 00010286
Jul 15 15:25:06 xen2 [ 4376.165890] RAX: 0000000000000025 RBX: 00000000fffff200 RCX: 0000000000000000
Jul 15 15:25:06 xen2 [ 4376.166982] RDX: 0000000000000000 RSI: ffff880186a0de98 RDI: ffff880186a0de98
Jul 15 15:25:06 xen2 [ 4376.168099] RBP: ffff8801732ff000 R08: 0000000000000001 R09: 0000000000000a57
Jul 15 15:25:06 xen2 [ 4376.169081] R10: 0000000000001000 R11: 0000000000000001 R12: 0000000000000200
Jul 15 15:25:06 xen2 [ 4376.170198] R13: 0000000000001000 R14: ffff88015f9d7800 R15: ffff88016fce1800
Jul 15 15:25:06 xen2 [ 4376.171330] FS:  0000000000000000(0000) GS:ffff880186a00000(0000) knlGS:ffff880186a00000
Jul 15 15:25:06 xen2 [ 4376.172474] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 15 15:25:06 xen2 [ 4376.173600] CR2: 000000b0f98d1970 CR3: 0000000175d4f000 CR4: 0000000000042660
Jul 15 15:25:06 xen2 [ 4376.174643] Call Trace:
Jul 15 15:25:06 xen2 [ 4376.175743]  ? __sbitmap_get_word+0x2a/0x80
Jul 15 15:25:06 xen2 [ 4376.176814]  ? blk_mq_dispatch_rq_list+0x200/0x3d0
Jul 15 15:25:06 xen2 [ 4376.177932]  ? blk_mq_flush_busy_ctxs+0xd1/0x120
Jul 15 15:25:06 xen2 [ 4376.178961]  ? blk_mq_sched_dispatch_requests+0x1c0/0x1f0
Jul 15 15:25:06 xen2 [ 4376.179942]  ? __blk_mq_delay_run_hw_queue+0x8f/0xa0
Jul 15 15:25:06 xen2 [ 4376.180941]  ? blk_mq_flush_plug_list+0x184/0x260
Jul 15 15:25:06 xen2 [ 4376.181935]  ? blk_flush_plug_list+0xf2/0x280
Jul 15 15:25:06 xen2 [ 4376.182952]  ? blk_finish_plug+0x27/0x40
Jul 15 15:25:06 xen2 [ 4376.183985]  ? dispatch_rw_block_io+0x732/0x9c0 [xen_blkback]
Jul 15 15:25:06 xen2 [ 4376.185059]  ? _raw_spin_lock_irqsave+0x17/0x39
Jul 15 15:25:06 xen2 [ 4376.186103]  ? __do_block_io_op+0x362/0x690 [xen_blkback]
Jul 15 15:25:06 xen2 [ 4376.187167]  ? _raw_spin_unlock_irqrestore+0x16/0x20
Jul 15 15:25:06 xen2 [ 4376.188216]  ? __do_block_io_op+0x362/0x690 [xen_blkback]
Jul 15 15:25:06 xen2 [ 4376.189294]  ? xen_blkif_schedule+0x116/0x7f0 [xen_blkback]
Jul 15 15:25:06 xen2 [ 4376.190247]  ? __schedule+0x3cd/0x850
Jul 15 15:25:06 xen2 [ 4376.191152]  ? remove_wait_queue+0x60/0x60
Jul 15 15:25:06 xen2 [ 4376.192112]  ? kthread+0xfc/0x130
Jul 15 15:25:06 xen2 [ 4376.193169]  ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
Jul 15 15:25:06 xen2 [ 4376.194105]  ? kthread_create_on_node+0x70/0x70
Jul 15 15:25:06 xen2 [ 4376.195059]  ? do_group_exit+0x3a/0xa0
Jul 15 15:25:06 xen2 [ 4376.196049]  ? ret_from_fork+0x25/0x30
Jul 15 15:25:06 xen2 [ 4376.197050] Code: f9 ff ff 41 f6 47 4a 04 c6 05 7a 3e 00 00 01 41 8b 97 70 01 00 00 74 28 41 8b b7 90 00 00 00 48 c7 c7 b8 87 48 c0 e8 40 a4 c4 c0 <0f> ff e9 4d fe ff ff 0f 0b 4c 8b 2d c5 95 79 c1 e9 53 ff ff ff 
Jul 15 15:25:06 xen2 [ 4376.198947] ---[ end trace 6d7d395a29c931b5 ]---
Jul 15 15:25:06 xen2 [ 4376.200012] sg[0] phys_addr:0x0000000aff549e00 offset:3584 length:4608 dma_address:0x00000000004a3000 dma_length:4608
Jul 15 15:25:06 xen2 [ 4376.200951] sg[1] phys_addr:0x0000000aff5c3000 offset:0 length:4096 dma_address:0x00000009f4a80000 dma_length:4096
Jul 15 15:25:06 xen2 [ 4376.202015] sg[2] phys_addr:0x0000000aff615000 offset:0 length:4096 dma_address:0x00000009f4a80000 dma_length:4096
Jul 15 15:25:06 xen2 [ 4376.203006] sg[3] phys_addr:0x0000000aff608000 offset:0 length:4096 dma_address:0x00000009f4a80000 dma_length:4096
Jul 15 15:25:06 xen2 [ 4376.203889] sg[4] phys_addr:0x0000000aff50e000 offset:0 length:4096 dma_address:0x00000009f5a4e000 dma_length:4096
Jul 15 15:25:06 xen2 [ 4376.204722] print_req_error: I/O error, dev nvme1n1, sector 14318951

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-07-15 13:34 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-10 18:03 kernel BUG at nvme/host/pci.c Andreas Pflug
2017-07-10 19:08 ` Keith Busch
2017-07-11  7:44   ` Andreas Pflug
2017-07-11 19:45     ` Keith Busch
2017-07-11 19:44       ` Scott Bauer
2017-07-12  6:06       ` Andreas Pflug
2017-07-12 19:50         ` Keith Busch
2017-07-13  8:46           ` Andreas Pflug
2017-07-13  9:00             ` Sagi Grimberg
2017-07-13 13:47             ` Keith Busch
2017-07-14 16:47               ` Andreas Pflug
2017-07-14 17:08                 ` Keith Busch
2017-07-15  8:51                   ` Christoph Hellwig
2017-07-15  8:51                     ` Christoph Hellwig
2017-07-15 13:34                     ` Andreas Pflug
2017-07-15 13:34                       ` Andreas Pflug

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.