* [nvme] Invalid SGL for payload:91648 nents:3 @ 2019-06-25 7:51 Wolfgang Rohdewald 2019-06-25 9:17 ` Ming Lei 0 siblings, 1 reply; 11+ messages in thread From: Wolfgang Rohdewald @ 2019-06-25 7:51 UTC (permalink / raw) For some time now, I am getting this on an Intel SSD M.2, output of smartctl see below. This also happens with sgl_threshold=0. Untainted kernel 5.1.14 (happens also with 5.1.8 and 5.1.15) on Ubuntu 19.04 I cannot say exactly when this started. I cannot trigger it with cp -a but after starting Virtualbox 6.0.6_Ubuntu r129722 with a Windows 10 client, it happens quickly. The first logged case is with Linux 5.1.11 but I did not use Virtualbox for some time before, so if that is what it triggers, the bug might be older. I have a Samsung SSD 950 PRO 512GB for the root partition, the Intel SSD holds a separate ext4 file system. According to smartctl, the Samsung has no problems. Using Virtualbox there also gives me unexplainable print_req_error messages but nothing else. After this happens, smartctl -a often says Read Error Information Log failed: NVMe Status 0x02 Otherwise, smartctl finds no problems with the SSD, fsck -f is clean, fsck -c finds no bad blocks. The sector numbers in the print_req_error message vary, see below. smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.1.15-364] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: INTEL SSDPEKNW020T8 Serial Number: BTNH903409YJ2P0C Firmware Version: 002C PCI Vendor/Subsystem ID: 0x8086 IEEE OUI Identifier: 0x5cd2e4 Controller ID: 1 Number of Namespaces: 1 Namespace 1 Size/Capacity: 2.048.408.248.320 [2,04 TB] Namespace 1 Formatted LBA Size: 512 Local Time is: Tue Jun 25 08:57:41 2019 CEST Firmware Updates (0x14): 2 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Maximum Data Transfer Size: 32 Pages Warning Comp. Temp. Threshold: 77 Celsius Critical Comp. Temp. Threshold: 80 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 5.50W - - 0 0 0 0 0 0 1 + 3.60W - - 1 1 1 1 0 0 2 + 2.60W - - 2 2 2 2 0 0 3 - 0.0300W - - 3 3 3 3 5000 5000 4 - 0.0040W - - 4 4 4 4 5000 9000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02, NSID 0x1) Critical Warning: 0x00 Temperature: 32 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 9.228.998 [4,72 TB] Data Units Written: 3.245.879 [1,66 TB] Host Read Commands: 41.518.159 Host Write Commands: 14.046.787 Controller Busy Time: 365 Power Cycles: 40 Power On Hours: 1.239 Unsafe Shutdowns: 11 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Read Error Information Log failed: NVMe Status 0x02 Jun 23 07:49:20 skull kernel: [ 171.583390] vboxdrv: 0000000000000000 VBoxEhciR0.r0 Jun 23 07:52:35 skull kernel: [ 366.611276] sg[0] phys_addr:0x0000000372be9000 offset:0 length:60416 dma_address:0x0000000372be9000 dma_length:60416 Jun 23 07:52:35 skull kernel: [ 366.611277] sg[1] phys_addr:0x0000000372bf7c00 offset:3072 length:33792 dma_address:0x0000000372bf7c00 dma_length:33792 Jun 23 07:52:35 skull kernel: [ 366.611278] sg[2] phys_addr:0x0000000372929000 offset:0 length:36864 dma_address:0x0000000372929000 dma_length:36864 Jun 23 07:52:35 skull kernel: [ 366.611281] ------------[ cut here ]------------ Jun 23 07:52:35 skull kernel: [ 366.611282] Invalid SGL for payload:131072 nents:3 Jun 23 07:52:35 skull kernel: [ 366.611294] WARNING: CPU: 2 PID: 7675 at drivers/nvme/host/pci.c:748 nvme_queue_rq+0x9ed/0xa40 Jun 23 07:52:35 skull kernel: [ 366.611294] Modules linked in: ccm rfcomm ipt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc rpcsec_gss_krb5 auth_rpcgss pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfsv4 nfs lockd grace fscache dummy nfnetlink_queue nfnetlink_log nfnetlink cmac bnep input_leds joydev btusb btrtl overlay btbcm btintel bluetooth ecdh_generic binfmt_misc nls_iso8859_1 arc4 snd_hda_codec_hdmi snd_hda_codec_realtek intel_rapl x86_pkg_temp_thermal snd_hda_codec_generic intel_powerclamp snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep crct10dif_pclmul snd_pcm crc32_pclmul ghash_clmulni_intel snd_seq_midi snd_seq_midi_event iwlmvm snd_rawmidi aesni_intel mac80211 aes_x86_64 crypto_simd cryptd glue_helper intel_cstate intel_rapl_perf snd_seq iwlwifi snd_seq_device snd_timer snd cfg80211 soundcore mei_me mei intel_pch_thermal ir_rc6_decoder wmi Jun 23 07:52:35 skull kernel: [ 366.611322] rc_rc6_mce nuvoton_cir rc_core acpi_pad squashfs sch_fq_codel nct6775 hwmon_vid coretemp sunrpc ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops e1000e drm sdhci_pci cqhci i2c_i801 sdhci video Jun 23 07:52:35 skull kernel: [ 366.611333] CPU: 2 PID: 7675 Comm: AioMgr0-N Tainted: G OE 5.1.14-363 #1 Jun 23 07:52:35 skull kernel: [ 366.611334] Hardware name: /NUC6i7KYB, BIOS KYSKLi70.86A.0062.2019.0507.1702 05/07/2019 Jun 23 07:52:35 skull kernel: [ 366.611336] RIP: 0010:nvme_queue_rq+0x9ed/0xa40 Jun 23 07:52:35 skull kernel: [ 366.611338] Code: 48 c7 c7 38 6a a3 b3 e8 a1 f0 e1 ff 41 8b 97 4c 01 00 00 41 f6 47 1e 04 75 57 41 8b 77 24 48 c7 c7 80 7e f2 b2 e8 bd e4 a2 ff <0f> 0b bd 0a 00 00 00 e9 fb fd ff ff 49 89 df bd 09 00 00 00 e9 ee Jun 23 07:52:35 skull kernel: [ 366.611338] RSP: 0018:ffffa396c8fb75c8 EFLAGS: 00010282 Jun 23 07:52:35 skull kernel: [ 366.611340] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000006 Jun 23 07:52:35 skull kernel: [ 366.611340] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff97649ea954f0 Jun 23 07:52:35 skull kernel: [ 366.611341] RBP: 0000000000000000 R08: 000000000000067c R09: 0000000000000004 Jun 23 07:52:35 skull kernel: [ 366.611342] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000003 Jun 23 07:52:35 skull kernel: [ 366.611342] R13: ffff97649aa28000 R14: 0000000000011000 R15: ffff976491a7ac00 Jun 23 07:52:35 skull kernel: [ 366.611344] FS: 00007f148b0c8700(0000) GS:ffff97649ea80000(0000) knlGS:0000000000000000 Jun 23 07:52:35 skull kernel: [ 366.611344] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 23 07:52:35 skull kernel: [ 366.611345] CR2: 00007ffb890206c4 CR3: 00000005839f8005 CR4: 00000000003626e0 Jun 23 07:52:35 skull kernel: [ 366.611346] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 23 07:52:35 skull kernel: [ 366.611347] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 23 07:52:35 skull kernel: [ 366.611347] Call Trace: Jun 23 07:52:35 skull kernel: [ 366.611354] __blk_mq_try_issue_directly+0x114/0x1c0 Jun 23 07:52:35 skull kernel: [ 366.611357] blk_mq_request_issue_directly+0x48/0xe0 Jun 23 07:52:35 skull kernel: [ 366.611359] blk_mq_try_issue_list_directly+0x41/0xb0 Jun 23 07:52:35 skull kernel: [ 366.611361] blk_mq_sched_insert_requests+0xa5/0xf0 Jun 23 07:52:35 skull kernel: [ 366.611363] blk_mq_flush_plug_list+0x211/0x2c0 Jun 23 07:52:35 skull kernel: [ 366.611366] blk_flush_plug_list+0xc6/0xf0 Jun 23 07:52:35 skull kernel: [ 366.611368] blk_mq_make_request+0x208/0x580 Jun 23 07:52:35 skull kernel: [ 366.611370] generic_make_request+0x19e/0x400 Jun 23 07:52:35 skull kernel: [ 366.611373] submit_bio+0x45/0x140 Jun 23 07:52:35 skull kernel: [ 366.611376] do_blockdev_direct_IO+0x21e4/0x2e30 Jun 23 07:52:35 skull kernel: [ 366.611380] ? find_get_pages_range_tag+0x81/0x2d0 Jun 23 07:52:35 skull kernel: [ 366.611384] ? ext4_dio_get_block_unwritten_sync+0x50/0x50 Jun 23 07:52:35 skull kernel: [ 366.611386] ext4_direct_IO+0x359/0x740 Jun 23 07:52:35 skull kernel: [ 366.611388] ? touch_atime+0x33/0xe0 Jun 23 07:52:35 skull kernel: [ 366.611391] generic_file_read_iter+0x9c/0xc90 Jun 23 07:52:35 skull kernel: [ 366.611394] aio_read+0xec/0x160 Jun 23 07:52:35 skull kernel: [ 366.611397] ? tomoyo_init_request_info+0x84/0x90 Jun 23 07:52:35 skull kernel: [ 366.611399] ? tomoyo_path_number_perm+0x68/0x1e0 Jun 23 07:52:35 skull kernel: [ 366.611401] ? _cond_resched+0x15/0x30 Jun 23 07:52:35 skull kernel: [ 366.611403] ? __kmalloc+0x166/0x200 Jun 23 07:52:35 skull kernel: [ 366.611416] ? rtR0MemAllocEx+0x14c/0x240 [vboxdrv] Jun 23 07:52:35 skull kernel: [ 366.611425] ? supdrvIOCtl+0xcc1/0x36e0 [vboxdrv] Jun 23 07:52:35 skull kernel: [ 366.611426] ? kmem_cache_alloc+0x150/0x1c0 Jun 23 07:52:35 skull kernel: [ 366.611428] io_submit_one+0x171/0xb20 Jun 23 07:52:35 skull kernel: [ 366.611434] ? VBoxDrvLinuxIOCtl_6_0_6+0x188/0x240 [vboxdrv] Jun 23 07:52:35 skull kernel: [ 366.611436] ? __x64_sys_io_submit+0xa9/0x190 Jun 23 07:52:35 skull kernel: [ 366.611441] ? lookup_ioctx+0x1d/0x90 Jun 23 07:52:35 skull kernel: [ 366.611442] __x64_sys_io_submit+0xa9/0x190 Jun 23 07:52:35 skull kernel: [ 366.611451] do_syscall_64+0x5f/0x190 Jun 23 07:52:35 skull kernel: [ 366.611453] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 23 07:52:35 skull kernel: [ 366.611455] RIP: 0033:0x7f14ed70e2e9 Jun 23 07:52:35 skull kernel: [ 366.611457] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 77 cb 0c 00 f7 d8 64 89 01 48 Jun 23 07:52:35 skull kernel: [ 366.611457] RSP: 002b:00007f148b0c7b98 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 Jun 23 07:52:35 skull kernel: [ 366.611459] RAX: ffffffffffffffda RBX: 00007f14948aaed0 RCX: 00007f14ed70e2e9 Jun 23 07:52:35 skull kernel: [ 366.611459] RDX: 00007f148b0c7c60 RSI: 000000000000000a RDI: 00007f14d81ab000 Jun 23 07:52:35 skull kernel: [ 366.611460] RBP: 00007f148b0c7bc0 R08: 0000000000000001 R09: 00007f148b0c7c70 Jun 23 07:52:35 skull kernel: [ 366.611461] R10: 00007f149c00ea00 R11: 0000000000000246 R12: 00007f148b0c7c60 Jun 23 07:52:35 skull kernel: [ 366.611461] R13: 000000000000000a R14: 00007f149493f5a0 R15: 000000000000000a Jun 23 07:52:35 skull kernel: [ 366.611463] ---[ end trace d2b93cc190ac7a41 ]--- Jun 23 07:52:35 skull kernel: [ 366.611466] print_req_error: I/O error, dev nvme1n1, sector 741253002 flags 4000 grep -a sector /var/log/kern.log* /var/log/kern.log:Jun 23 07:52:35 skull kernel: [ 366.611466] print_req_error: I/O error, dev nvme1n1, sector 741253002 flags 4000 /var/log/kern.log:Jun 23 07:55:24 skull kernel: [ 535.823599] print_req_error: I/O error, dev nvme1n1, sector 160612274 flags 4000 /var/log/kern.log:Jun 23 08:04:50 skull kernel: [ 70.839418] print_req_error: I/O error, dev nvme1n1, sector 161154970 flags 0 /var/log/kern.log:Jun 23 08:05:52 skull kernel: [ 132.746319] print_req_error: I/O error, dev nvme1n1, sector 161154970 flags 4000 /var/log/kern.log:Jun 23 08:05:52 skull kernel: [ 132.832546] print_req_error: I/O error, dev nvme1n1, sector 161154970 flags 0 /var/log/kern.log:Jun 23 08:05:52 skull kernel: [ 132.855611] print_req_error: I/O error, dev nvme1n1, sector 161154970 flags 4000 /var/log/kern.log:Jun 23 08:08:52 skull kernel: [ 312.957573] print_req_error: I/O error, dev nvme1n1, sector 161150857 flags 0 /var/log/kern.log:Jun 23 08:08:52 skull kernel: [ 312.958424] print_req_error: I/O error, dev nvme1n1, sector 161150857 flags 0 /var/log/kern.log:Jun 23 08:08:52 skull kernel: [ 312.959489] print_req_error: I/O error, dev nvme1n1, sector 161150857 flags 0 /var/log/kern.log:Jun 24 18:57:11 skull kernel: [ 3576.127767] print_req_error: I/O error, dev nvme1n1, sector 790335378 flags 0 /var/log/kern.log:Jun 24 18:57:11 skull kernel: [ 3576.130135] print_req_error: I/O error, dev nvme1n1, sector 790335378 flags 4000 /var/log/kern.log:Jun 24 18:57:11 skull kernel: [ 3576.132458] print_req_error: I/O error, dev nvme1n1, sector 790335378 flags 0 /var/log/kern.log:Jun 24 18:57:11 skull kernel: [ 3576.134669] print_req_error: I/O error, dev nvme1n1, sector 790335378 flags 4000 /var/log/kern.log:Jun 24 18:58:40 skull kernel: [ 3665.022567] print_req_error: I/O error, dev nvme1n1, sector 740323209 flags 0 /var/log/kern.log:Jun 24 18:58:40 skull kernel: [ 3665.023451] print_req_error: I/O error, dev nvme1n1, sector 740323209 flags 0 /var/log/kern.log:Jun 24 18:58:40 skull kernel: [ 3665.024407] print_req_error: I/O error, dev nvme1n1, sector 740323209 flags 0 /var/log/kern.log:Jun 24 18:58:40 skull kernel: [ 3665.037217] print_req_error: I/O error, dev nvme1n1, sector 740323209 flags 0 /var/log/kern.log:Jun 24 18:58:40 skull kernel: [ 3665.038041] print_req_error: I/O error, dev nvme1n1, sector 740323209 flags 0 /var/log/kern.log:Jun 24 18:58:40 skull kernel: [ 3665.038891] print_req_error: I/O error, dev nvme1n1, sector 740323209 flags 0 /var/log/kern.log:Jun 24 18:58:40 skull kernel: [ 3665.040467] print_req_error: I/O error, dev nvme1n1, sector 740323209 flags 0 /var/log/kern.log:Jun 24 20:18:56 skull kernel: [ 8481.617887] print_req_error: I/O error, dev nvme1n1, sector 1805959090 flags 8801 /var/log/kern.log:Jun 24 22:07:18 skull kernel: [ 5407.282088] print_req_error: I/O error, dev nvme1n1, sector 1988491146 flags 4000 /var/log/kern.log:Jun 25 07:23:32 skull kernel: [ 111.452844] print_req_error: I/O error, dev nvme1n1, sector 1988980610 flags 0 /var/log/kern.log:Jun 25 07:23:32 skull kernel: [ 111.455658] print_req_error: I/O error, dev nvme1n1, sector 1988980610 flags 4000 /var/log/kern.log:Jun 25 07:23:32 skull kernel: [ 111.458181] print_req_error: I/O error, dev nvme1n1, sector 1988980610 flags 0 /var/log/kern.log:Jun 25 07:23:32 skull kernel: [ 111.460729] print_req_error: I/O error, dev nvme1n1, sector 1988980610 flags 4000 /var/log/kern.log:Jun 25 08:33:22 skull kernel: [ 659.895032] print_req_error: I/O error, dev nvme1n1, sector 739135402 flags 0 /var/log/kern.log:Jun 25 08:33:22 skull kernel: [ 659.904044] print_req_error: I/O error, dev nvme1n1, sector 739135402 flags 0 /var/log/kern.log:Jun 25 08:33:22 skull kernel: [ 659.919253] print_req_error: I/O error, dev nvme1n1, sector 739135402 flags 4000 /var/log/kern.log:Jun 25 08:43:06 skull kernel: [ 108.723789] print_req_error: I/O error, dev nvme1n1, sector 1781874554 flags 0 /var/log/kern.log.1:Jun 19 18:35:10 skull kernel: [ 1209.920642] print_req_error: I/O error, dev nvme1n1, sector 313347962 flags 4000 -- mit freundlichen Gr?ssen Wolfgang Rohdewald ^ permalink raw reply [flat|nested] 11+ messages in thread
* [nvme] Invalid SGL for payload:91648 nents:3 2019-06-25 7:51 [nvme] Invalid SGL for payload:91648 nents:3 Wolfgang Rohdewald @ 2019-06-25 9:17 ` Ming Lei [not found] ` <CAOSXXT5PgPJqouD0G7qJjar5tmy-9z+wWdFMqpD2K1=5vQZ+HA@mail.gmail.com> 0 siblings, 1 reply; 11+ messages in thread From: Ming Lei @ 2019-06-25 9:17 UTC (permalink / raw) Hi Wolfgang, On Tue, Jun 25, 2019@09:51:04AM +0200, Wolfgang Rohdewald wrote: > For some time now, I am getting this on an Intel SSD M.2, output of smartctl see below. > This also happens with sgl_threshold=0. > > Untainted kernel 5.1.14 (happens also with 5.1.8 and 5.1.15) on Ubuntu 19.04 > > I cannot say exactly when this started. I cannot trigger it with > cp -a but after starting Virtualbox 6.0.6_Ubuntu r129722 with a Windows 10 client, > it happens quickly. The first logged case is with Linux 5.1.11 but I did > not use Virtualbox for some time before, so if that is what it triggers, > the bug might be older. > > I have a Samsung SSD 950 PRO 512GB for the root partition, the Intel SSD > holds a separate ext4 file system. According to smartctl, the Samsung > has no problems. Using Virtualbox there also gives me unexplainable > print_req_error messages but nothing else. > > After this happens, smartctl -a often says > > Read Error Information Log failed: NVMe Status 0x02 > > Otherwise, smartctl finds no problems with the SSD, fsck -f is clean, > fsck -c finds no bad blocks. > > The sector numbers in the print_req_error message vary, see below. > > smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.1.15-364] (local build) > Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Number: INTEL SSDPEKNW020T8 > Serial Number: BTNH903409YJ2P0C > Firmware Version: 002C > PCI Vendor/Subsystem ID: 0x8086 > IEEE OUI Identifier: 0x5cd2e4 > Controller ID: 1 > Number of Namespaces: 1 > Namespace 1 Size/Capacity: 2.048.408.248.320 [2,04 TB] > Namespace 1 Formatted LBA Size: 512 > Local Time is: Tue Jun 25 08:57:41 2019 CEST > Firmware Updates (0x14): 2 Slots, no Reset required > Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test > Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp > Maximum Data Transfer Size: 32 Pages > Warning Comp. Temp. Threshold: 77 Celsius > Critical Comp. Temp. Threshold: 80 Celsius > > Supported Power States > St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat > 0 + 5.50W - - 0 0 0 0 0 0 > 1 + 3.60W - - 1 1 1 1 0 0 > 2 + 2.60W - - 2 2 2 2 0 0 > 3 - 0.0300W - - 3 3 3 3 5000 5000 > 4 - 0.0040W - - 4 4 4 4 5000 9000 > > Supported LBA Sizes (NSID 0x1) > Id Fmt Data Metadt Rel_Perf > 0 + 512 0 0 > > === START OF SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SMART/Health Information (NVMe Log 0x02, NSID 0x1) > Critical Warning: 0x00 > Temperature: 32 Celsius > Available Spare: 100% > Available Spare Threshold: 10% > Percentage Used: 0% > Data Units Read: 9.228.998 [4,72 TB] > Data Units Written: 3.245.879 [1,66 TB] > Host Read Commands: 41.518.159 > Host Write Commands: 14.046.787 > Controller Busy Time: 365 > Power Cycles: 40 > Power On Hours: 1.239 > Unsafe Shutdowns: 11 > Media and Data Integrity Errors: 0 > Error Information Log Entries: 0 > Warning Comp. Temperature Time: 0 > Critical Comp. Temperature Time: 0 > > Read Error Information Log failed: NVMe Status 0x02 > > > Jun 23 07:49:20 skull kernel: [ 171.583390] vboxdrv: 0000000000000000 VBoxEhciR0.r0 > Jun 23 07:52:35 skull kernel: [ 366.611276] sg[0] phys_addr:0x0000000372be9000 offset:0 length:60416 dma_address:0x0000000372be9000 dma_length:60416 > Jun 23 07:52:35 skull kernel: [ 366.611277] sg[1] phys_addr:0x0000000372bf7c00 offset:3072 length:33792 dma_address:0x0000000372bf7c00 dma_length:33792 > Jun 23 07:52:35 skull kernel: [ 366.611278] sg[2] phys_addr:0x0000000372929000 offset:0 length:36864 dma_address:0x0000000372929000 dma_length:36864 The above sg supposes to not be in same IO request, given each one is unaligned wrt. virt_boundary limit, and bvec_split_segs() did check the limit. > Jun 23 07:52:35 skull kernel: [ 366.611281] ------------[ cut here ]------------ > Jun 23 07:52:35 skull kernel: [ 366.611282] Invalid SGL for payload:131072 nents:3 > Jun 23 07:52:35 skull kernel: [ 366.611294] WARNING: CPU: 2 PID: 7675 at drivers/nvme/host/pci.c:748 nvme_queue_rq+0x9ed/0xa40 So far, no idea how the issue is triggered, maybe we can find something by dumping the request, could you apply the following patch against your 5.1 kernel source code and post the log after the issue is triggered? diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index a90cf5d63aac..c7999e362434 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -656,6 +656,29 @@ static void nvme_print_sgl(struct scatterlist *sgl, int nents) } } +static void nvme_dump_rq(const struct request *rq) +{ + int i = 0; + struct req_iterator rq_iter; + struct bio_vec bvec; + struct bio *bio; + + trace_printk("%s: bvec for %d(f:%x/%x, seg: %d)\n", __func__, + rq->tag, rq->cmd_flags, rq->rq_flags, + rq->nr_phys_segments); + + __rq_for_each_bio(bio, rq) + printk("bio %d: %hx/%hx %u %llu %u\n", i++, + bio->bi_flags, bio->bi_opf, + bio->bi_phys_segments, + (unsigned long long)bio->bi_iter.bi_sector, + bio->bi_iter.bi_size); + i = 0; + rq_for_each_bvec(bvec, rq, rq_iter) + printk("%d: pfn %lu offset %u len %u\n", i++, + (unsigned long)page_to_pfn(bvec.bv_page), + bvec.bv_offset, bvec.bv_len); +} static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev, struct request *req, struct nvme_rw_command *cmnd) { @@ -746,6 +769,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev, WARN(DO_ONCE(nvme_print_sgl, iod->sg, iod->nents), "Invalid SGL for payload:%d nents:%d\n", blk_rq_payload_bytes(req), iod->nents); + nvme_dump_rq(req); return BLK_STS_IOERR; } Thanks, Ming ^ permalink raw reply related [flat|nested] 11+ messages in thread
[parent not found: <CAOSXXT5PgPJqouD0G7qJjar5tmy-9z+wWdFMqpD2K1=5vQZ+HA@mail.gmail.com>]
* [nvme] Invalid SGL for payload:91648 nents:3 [not found] ` <CAOSXXT5PgPJqouD0G7qJjar5tmy-9z+wWdFMqpD2K1=5vQZ+HA@mail.gmail.com> @ 2019-06-25 9:45 ` Ming Lei 2019-06-25 9:50 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Ming Lei @ 2019-06-25 9:45 UTC (permalink / raw) On Tue, Jun 25, 2019@06:38:58PM +0900, Keith Busch wrote: > The first two sg elements should have been merged to one and would have > avoided the error. The bug is kernel, nothing to do with the device. The default max segment size is 64k, so the first two can't be merged. And the 1st sg should have been dispatched as single request. Thanks, Ming ^ permalink raw reply [flat|nested] 11+ messages in thread
* [nvme] Invalid SGL for payload:91648 nents:3 2019-06-25 9:45 ` Ming Lei @ 2019-06-25 9:50 ` Christoph Hellwig 2019-06-25 10:07 ` Ming Lei 2019-06-25 12:11 ` Wolfgang Rohdewald 0 siblings, 2 replies; 11+ messages in thread From: Christoph Hellwig @ 2019-06-25 9:50 UTC (permalink / raw) On Tue, Jun 25, 2019@05:45:35PM +0800, Ming Lei wrote: > On Tue, Jun 25, 2019@06:38:58PM +0900, Keith Busch wrote: > > The first two sg elements should have been merged to one and would have > > avoided the error. The bug is kernel, nothing to do with the device. > > The default max segment size is 64k, so the first two can't be merged. > > And the 1st sg should have been dispatched as single request. Well, that is exactly what I fixed with my patches that went into 5.2. Looks like we need to backport those as well. Wolfgang, can you try the latest 5.2-rc git tree? ^ permalink raw reply [flat|nested] 11+ messages in thread
* [nvme] Invalid SGL for payload:91648 nents:3 2019-06-25 9:50 ` Christoph Hellwig @ 2019-06-25 10:07 ` Ming Lei 2019-06-25 12:11 ` Wolfgang Rohdewald 1 sibling, 0 replies; 11+ messages in thread From: Ming Lei @ 2019-06-25 10:07 UTC (permalink / raw) Hi Christoph, On Tue, Jun 25, 2019@11:50:13AM +0200, Christoph Hellwig wrote: > On Tue, Jun 25, 2019@05:45:35PM +0800, Ming Lei wrote: > > On Tue, Jun 25, 2019@06:38:58PM +0900, Keith Busch wrote: > > > The first two sg elements should have been merged to one and would have > > > avoided the error. The bug is kernel, nothing to do with the device. > > > > The default max segment size is 64k, so the first two can't be merged. > > > > And the 1st sg should have been dispatched as single request. > > Well, that is exactly what I fixed with my patches that went into > 5.2. Looks like we need to backport those as well. Even though without your patch, this issue shouldn't happen given the two sg shouldn't be submitted in single request. Something must be wrong, I'd suggest to root cause it. Thanks, Ming ^ permalink raw reply [flat|nested] 11+ messages in thread
* [nvme] Invalid SGL for payload:91648 nents:3 2019-06-25 9:50 ` Christoph Hellwig 2019-06-25 10:07 ` Ming Lei @ 2019-06-25 12:11 ` Wolfgang Rohdewald 2019-06-25 13:29 ` Ming Lei 1 sibling, 1 reply; 11+ messages in thread From: Wolfgang Rohdewald @ 2019-06-25 12:11 UTC (permalink / raw) On Di, 2019-06-25@11:50 +0200, Christoph Hellwig wrote: > On Tue, Jun 25, 2019@05:45:35PM +0800, Ming Lei wrote: > > On Tue, Jun 25, 2019@06:38:58PM +0900, Keith Busch wrote: > > > The first two sg elements should have been merged to one and would have > > > avoided the error. The bug is kernel, nothing to do with the device. > > > > The default max segment size is 64k, so the first two can't be merged. > > > > And the 1st sg should have been dispatched as single request. > > Well, that is exactly what I fixed with my patches that went into > 5.2. Looks like we need to backport those as well. > > Wolfgang, can you try the latest 5.2-rc git tree? 5.2.0-rc6 works fine. Do you still want me to apply nvme_dump_rq() to 5.1 for finding the root cause? -- mit freundlichen Gr?ssen Wolfgang Rohdewald ^ permalink raw reply [flat|nested] 11+ messages in thread
* [nvme] Invalid SGL for payload:91648 nents:3 2019-06-25 12:11 ` Wolfgang Rohdewald @ 2019-06-25 13:29 ` Ming Lei 2019-06-25 15:07 ` Wolfgang Rohdewald 0 siblings, 1 reply; 11+ messages in thread From: Ming Lei @ 2019-06-25 13:29 UTC (permalink / raw) On Tue, Jun 25, 2019@02:11:04PM +0200, Wolfgang Rohdewald wrote: > On Di, 2019-06-25@11:50 +0200, Christoph Hellwig wrote: > > On Tue, Jun 25, 2019@05:45:35PM +0800, Ming Lei wrote: > > > On Tue, Jun 25, 2019@06:38:58PM +0900, Keith Busch wrote: > > > > The first two sg elements should have been merged to one and would have > > > > avoided the error. The bug is kernel, nothing to do with the device. > > > > > > The default max segment size is 64k, so the first two can't be merged. > > > > > > And the 1st sg should have been dispatched as single request. > > > > Well, that is exactly what I fixed with my patches that went into > > 5.2. Looks like we need to backport those as well. > > > > Wolfgang, can you try the latest 5.2-rc git tree? > > 5.2.0-rc6 works fine. > > Do you still want me to apply nvme_dump_rq() to 5.1 for finding the root cause? Yeah, please dump the request, and maybe there is issue in merge code. BTW, please replace the trace_printk with printk in the debug patch. Also not sure if Christoph's patch can be backported safely, given there is report wrt. too big max segment size recently. Thanks, Ming ^ permalink raw reply [flat|nested] 11+ messages in thread
* [nvme] Invalid SGL for payload:91648 nents:3 2019-06-25 13:29 ` Ming Lei @ 2019-06-25 15:07 ` Wolfgang Rohdewald 2019-06-26 1:26 ` Ming Lei 0 siblings, 1 reply; 11+ messages in thread From: Wolfgang Rohdewald @ 2019-06-25 15:07 UTC (permalink / raw) On Di, 2019-06-25@21:29 +0800, Ming Lei wrote: > On Tue, Jun 25, 2019@02:11:04PM +0200, Wolfgang Rohdewald wrote: > > On Di, 2019-06-25@11:50 +0200, Christoph Hellwig wrote: > > > On Tue, Jun 25, 2019@05:45:35PM +0800, Ming Lei wrote: > > > > On Tue, Jun 25, 2019@06:38:58PM +0900, Keith Busch wrote: > > > > > The first two sg elements should have been merged to one and would have > > > > > avoided the error. The bug is kernel, nothing to do with the device. > > > > > > > > The default max segment size is 64k, so the first two can't be merged. > > > > > > > > And the 1st sg should have been dispatched as single request. > > > > > > Well, that is exactly what I fixed with my patches that went into > > > 5.2. Looks like we need to backport those as well. > > > > > > Wolfgang, can you try the latest 5.2-rc git tree? > > > > 5.2.0-rc6 works fine. > > > > Do you still want me to apply nvme_dump_rq() to 5.1 for finding the root cause? > > Yeah, please dump the request, and maybe there is issue in merge code. > BTW, please replace the trace_printk with printk in the debug patch. > > Also not sure if Christoph's patch can be backported safely, given there > is report wrt. too big max segment size recently. > > Thanks, > Ming Jun 25 17:05:05 skull kernel: [ 150.419778] sg[0] phys_addr:0x000000049cbe9000 offset:0 length:32256 dma_address:0x000000049cbe9000 dma_length:32256 Jun 25 17:05:05 skull kernel: [ 150.419780] sg[1] phys_addr:0x000000049cbf0e00 offset:3584 length:40960 dma_address:0x000000049cbf0e00 dma_length:40960 Jun 25 17:05:05 skull kernel: [ 150.419783] ------------[ cut here ]------------ Jun 25 17:05:05 skull kernel: [ 150.419784] Invalid SGL for payload:73216 nents:2 Jun 25 17:05:05 skull kernel: [ 150.419820] WARNING: CPU: 2 PID: 3566 at drivers/nvme/host/pci.c:771 nvme_pci_setup_prps.isra.48+0x372/0x3d0 [nvme] Jun 25 17:05:05 skull kernel: [ 150.419821] Modules linked in: ccm rfcomm ipt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc rpcsec_gss_krb5 auth_rpcgss pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfsv4 nfs lockd grace fscache dummy nfnetlink_queue nfnetlink_log nfnetlink cmac bnep btusb btrtl btbcm btintel bluetooth joydev input_leds ecdh_generic overlay binfmt_misc nls_iso8859_1 arc4 snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec crct10dif_pclmul snd_hda_core crc32_pclmul snd_hwdep snd_pcm ghash_clmulni_intel snd_seq_midi snd_seq_midi_event iwlmvm snd_rawmidi mac80211 iwlwifi snd_seq aesni_intel aes_x86_64 crypto_simd cryptd glue_helper snd_seq_device snd_timer intel_cstate intel_rapl_perf snd cfg80211 soundcore mei_me intel_pch_thermal mei ir_rc6_decoder wmi Jun 25 17:05:05 skull kernel: [ 150.419845] rc_rc6_mce nuvoton_cir rc_core acpi_pad squashfs sch_fq_codel sunrpc nct6775 hwmon_vid coretemp ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvme sdhci_pci drm e1000e cqhci i2c_i801 nvme_core sdhci video Jun 25 17:05:05 skull kernel: [ 150.419857] CPU: 2 PID: 3566 Comm: AioMgr0-N Tainted: G OE 5.1.15- #1 Jun 25 17:05:05 skull kernel: [ 150.419857] Hardware name: /NUC6i7KYB, BIOS KYSKLi70.86A.0062.2019.0507.1702 05/07/2019 Jun 25 17:05:05 skull kernel: [ 150.419860] RIP: 0010:nvme_pci_setup_prps.isra.48+0x372/0x3d0 [nvme] Jun 25 17:05:05 skull kernel: [ 150.419861] Code: 48 c7 c7 c0 85 0b c0 e8 2c 08 fb da 41 8b 97 4c 01 00 00 41 f6 47 1e 04 75 57 41 8b 77 24 48 c7 c7 38 4a 0b c0 e8 78 fb bb da <0f> 0b e9 28 ff ff ff b8 0a 00 00 00 e9 f3 fe ff ff 4c 8b 7d 98 4d Jun 25 17:05:05 skull kernel: [ 150.419863] RSP: 0018:ffffa39186847558 EFLAGS: 00010282 Jun 25 17:05:05 skull kernel: [ 150.419864] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006 Jun 25 17:05:05 skull kernel: [ 150.419865] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff95d61ea954f0 Jun 25 17:05:05 skull kernel: [ 150.419865] RBP: ffffa391868475e8 R08: 000000000000066d R09: 0000000000000004 Jun 25 17:05:05 skull kernel: [ 150.419866] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002 Jun 25 17:05:05 skull kernel: [ 150.419867] R13: 0000000000000002 R14: 0000000000000007 R15: ffff95d60feed000 Jun 25 17:05:05 skull kernel: [ 150.419868] FS: 00007ff4d6168700(0000) GS:ffff95d61ea80000(0000) knlGS:0000000000000000 Jun 25 17:05:05 skull kernel: [ 150.419869] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 25 17:05:05 skull kernel: [ 150.419870] CR2: ffffe20ea05ec2e8 CR3: 0000000500d24004 CR4: 00000000003626e0 Jun 25 17:05:05 skull kernel: [ 150.419870] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 25 17:05:05 skull kernel: [ 150.419871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 25 17:05:05 skull kernel: [ 150.419872] Call Trace: Jun 25 17:05:05 skull kernel: [ 150.419878] nvme_queue_rq+0x566/0x710 [nvme] Jun 25 17:05:05 skull kernel: [ 150.419883] __blk_mq_try_issue_directly+0x114/0x1c0 Jun 25 17:05:05 skull kernel: [ 150.419886] ? ktime_get_raw_ts64+0x30/0xc0 Jun 25 17:05:05 skull kernel: [ 150.419888] blk_mq_request_issue_directly+0x48/0xe0 Jun 25 17:05:05 skull kernel: [ 150.419890] blk_mq_try_issue_list_directly+0x41/0xb0 Jun 25 17:05:05 skull kernel: [ 150.419893] blk_mq_sched_insert_requests+0xa5/0xf0 Jun 25 17:05:05 skull kernel: [ 150.419895] blk_mq_flush_plug_list+0x211/0x2c0 Jun 25 17:05:05 skull kernel: [ 150.419898] blk_flush_plug_list+0xc6/0xf0 Jun 25 17:05:05 skull kernel: [ 150.419900] blk_mq_make_request+0x208/0x580 Jun 25 17:05:05 skull kernel: [ 150.419902] generic_make_request+0x19e/0x400 Jun 25 17:05:05 skull kernel: [ 150.419904] submit_bio+0x45/0x140 Jun 25 17:05:05 skull kernel: [ 150.419908] do_blockdev_direct_IO+0x21e4/0x2e30 Jun 25 17:05:05 skull kernel: [ 150.419912] ? find_get_pages_range_tag+0x81/0x2d0 Jun 25 17:05:05 skull kernel: [ 150.419915] ? ext4_dio_get_block_unwritten_sync+0x50/0x50 Jun 25 17:05:05 skull kernel: [ 150.419917] ext4_direct_IO+0x359/0x740 Jun 25 17:05:05 skull kernel: [ 150.419920] ? touch_atime+0x33/0xe0 Jun 25 17:05:05 skull kernel: [ 150.419923] generic_file_read_iter+0x9c/0xc90 Jun 25 17:05:05 skull kernel: [ 150.419925] aio_read+0xec/0x160 Jun 25 17:05:05 skull kernel: [ 150.419929] ? tomoyo_init_request_info+0x84/0x90 Jun 25 17:05:05 skull kernel: [ 150.419930] ? tomoyo_path_number_perm+0x68/0x1e0 Jun 25 17:05:05 skull kernel: [ 150.419933] ? _cond_resched+0x15/0x30 Jun 25 17:05:05 skull kernel: [ 150.419934] ? __kmalloc+0x166/0x200 Jun 25 17:05:05 skull kernel: [ 150.419949] ? rtR0MemAllocEx+0x14c/0x240 [vboxdrv] Jun 25 17:05:05 skull kernel: [ 150.419958] ? supdrvIOCtl+0xcc1/0x36e0 [vboxdrv] Jun 25 17:05:05 skull kernel: [ 150.419959] ? kmem_cache_alloc+0x150/0x1c0 Jun 25 17:05:05 skull kernel: [ 150.419961] io_submit_one+0x171/0xb20 Jun 25 17:05:05 skull kernel: [ 150.419967] ? VBoxDrvLinuxIOCtl_6_0_6+0x188/0x240 [vboxdrv] Jun 25 17:05:05 skull kernel: [ 150.419969] ? __x64_sys_io_submit+0xa9/0x190 Jun 25 17:05:05 skull kernel: [ 150.419971] ? lookup_ioctx+0x1d/0x90 Jun 25 17:05:05 skull kernel: [ 150.419973] __x64_sys_io_submit+0xa9/0x190 Jun 25 17:05:05 skull kernel: [ 150.419975] do_syscall_64+0x5f/0x190 Jun 25 17:05:05 skull kernel: [ 150.419978] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 25 17:05:05 skull kernel: [ 150.419980] RIP: 0033:0x7ff539ace2e9 Jun 25 17:05:05 skull kernel: [ 150.419981] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 77 cb 0c 00 f7 d8 64 89 01 48 Jun 25 17:05:05 skull kernel: [ 150.419982] RSP: 002b:00007ff4d6167b98 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 Jun 25 17:05:05 skull kernel: [ 150.419984] RAX: ffffffffffffffda RBX: 00007ff4b801c9b0 RCX: 00007ff539ace2e9 Jun 25 17:05:05 skull kernel: [ 150.419984] RDX: 00007ff4d6167c60 RSI: 0000000000000005 RDI: 00007ff50802c000 Jun 25 17:05:05 skull kernel: [ 150.419985] RBP: 00007ff4d6167bc0 R08: 0000000000000001 R09: 00007ff4d6167c70 Jun 25 17:05:05 skull kernel: [ 150.419986] R10: 00007ff4b801b960 R11: 0000000000000246 R12: 00007ff4d6167c60 Jun 25 17:05:05 skull kernel: [ 150.419986] R13: 0000000000000005 R14: 00007ff4e0940100 R15: 0000000000000005 Jun 25 17:05:05 skull kernel: [ 150.419988] ---[ end trace 581bcb8c21b397f6 ]--- Jun 25 17:05:05 skull kernel: [ 150.419994] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:05 skull kernel: [ 150.420017] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:05 skull kernel: [ 150.420022] 0: pfn 4836329 offset 0 len 32256 Jun 25 17:05:05 skull kernel: [ 150.420027] 1: pfn 4836336 offset 3584 len 40960 Jun 25 17:05:05 skull kernel: [ 150.420034] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:05 skull kernel: [ 150.422640] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:05 skull kernel: [ 150.422642] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:05 skull kernel: [ 150.422644] 0: pfn 4834601 offset 0 len 32256 Jun 25 17:05:05 skull kernel: [ 150.422646] 1: pfn 4834608 offset 3584 len 40960 Jun 25 17:05:05 skull kernel: [ 150.422649] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000 Jun 25 17:05:05 skull kernel: [ 150.424948] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:05 skull kernel: [ 150.424949] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:05 skull kernel: [ 150.424951] 0: pfn 4834377 offset 0 len 32256 Jun 25 17:05:05 skull kernel: [ 150.424952] 1: pfn 4834384 offset 3584 len 40960 Jun 25 17:05:05 skull kernel: [ 150.424954] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:05 skull kernel: [ 150.428223] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:05 skull kernel: [ 150.428224] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:05 skull kernel: [ 150.428225] 0: pfn 4835945 offset 0 len 32256 Jun 25 17:05:05 skull kernel: [ 150.428226] 1: pfn 4835952 offset 3584 len 40960 Jun 25 17:05:05 skull kernel: [ 150.428229] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000 Jun 25 17:05:05 skull kernel: [ 150.452483] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:05 skull kernel: [ 150.452485] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:05 skull kernel: [ 150.452487] 0: pfn 4835399 offset 0 len 32256 Jun 25 17:05:05 skull kernel: [ 150.452488] 1: pfn 4835406 offset 3584 len 40960 Jun 25 17:05:05 skull kernel: [ 150.452490] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000 Jun 25 17:05:05 skull kernel: [ 150.455112] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:05 skull kernel: [ 150.455113] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:05 skull kernel: [ 150.455114] 0: pfn 4834601 offset 0 len 32256 Jun 25 17:05:05 skull kernel: [ 150.455115] 1: pfn 4834608 offset 3584 len 40960 Jun 25 17:05:05 skull kernel: [ 150.455118] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:05 skull kernel: [ 150.457630] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:05 skull kernel: [ 150.457632] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:05 skull kernel: [ 150.457633] 0: pfn 4835399 offset 0 len 32256 Jun 25 17:05:05 skull kernel: [ 150.457634] 1: pfn 4835406 offset 3584 len 40960 Jun 25 17:05:05 skull kernel: [ 150.457637] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:11 skull kernel: [ 155.479814] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:11 skull kernel: [ 155.479817] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:11 skull kernel: [ 155.479818] 0: pfn 4835945 offset 0 len 32256 Jun 25 17:05:11 skull kernel: [ 155.479820] 1: pfn 4835952 offset 3584 len 40960 Jun 25 17:05:11 skull kernel: [ 155.479825] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:11 skull kernel: [ 155.482676] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:11 skull kernel: [ 155.482679] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:11 skull kernel: [ 155.482680] 0: pfn 4835945 offset 0 len 32256 Jun 25 17:05:11 skull kernel: [ 155.482682] 1: pfn 4835952 offset 3584 len 40960 Jun 25 17:05:11 skull kernel: [ 155.482685] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000 Jun 25 17:05:11 skull kernel: [ 155.485259] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:11 skull kernel: [ 155.485261] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:11 skull kernel: [ 155.485263] 0: pfn 4834601 offset 0 len 32256 Jun 25 17:05:11 skull kernel: [ 155.485264] 1: pfn 4834608 offset 3584 len 40960 Jun 25 17:05:11 skull kernel: [ 155.485268] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:11 skull kernel: [ 155.487906] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:11 skull kernel: [ 155.487908] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:11 skull kernel: [ 155.487909] 0: pfn 4834377 offset 0 len 32256 Jun 25 17:05:11 skull kernel: [ 155.487911] 1: pfn 4834384 offset 3584 len 40960 Jun 25 17:05:11 skull kernel: [ 155.487913] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000 Jun 25 17:05:11 skull kernel: [ 155.513063] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:11 skull kernel: [ 155.513066] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:11 skull kernel: [ 155.513068] 0: pfn 4834601 offset 0 len 32256 Jun 25 17:05:11 skull kernel: [ 155.513069] 1: pfn 4834608 offset 3584 len 40960 Jun 25 17:05:11 skull kernel: [ 155.513073] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:11 skull kernel: [ 155.515837] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:11 skull kernel: [ 155.515839] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:11 skull kernel: [ 155.515840] 0: pfn 4834569 offset 0 len 32256 Jun 25 17:05:11 skull kernel: [ 155.515841] 1: pfn 4834576 offset 3584 len 40960 Jun 25 17:05:11 skull kernel: [ 155.515843] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000 Jun 25 17:05:11 skull kernel: [ 155.518592] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:11 skull kernel: [ 155.518594] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:11 skull kernel: [ 155.518595] 0: pfn 4836329 offset 0 len 32256 Jun 25 17:05:11 skull kernel: [ 155.518596] 1: pfn 4836336 offset 3584 len 40960 Jun 25 17:05:11 skull kernel: [ 155.518599] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:11 skull kernel: [ 155.523518] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:11 skull kernel: [ 155.523520] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:11 skull kernel: [ 155.523522] 0: pfn 4836329 offset 0 len 32256 Jun 25 17:05:11 skull kernel: [ 155.523523] 1: pfn 4836336 offset 3584 len 40960 Jun 25 17:05:11 skull kernel: [ 155.523527] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:16 skull kernel: [ 160.557994] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:16 skull kernel: [ 160.557997] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:16 skull kernel: [ 160.557999] 0: pfn 4835399 offset 0 len 32256 Jun 25 17:05:16 skull kernel: [ 160.558001] 1: pfn 4835406 offset 3584 len 40960 Jun 25 17:05:16 skull kernel: [ 160.558004] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:16 skull kernel: [ 160.560787] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:16 skull kernel: [ 160.560789] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:16 skull kernel: [ 160.560791] 0: pfn 4834377 offset 0 len 32256 Jun 25 17:05:16 skull kernel: [ 160.560792] 1: pfn 4834384 offset 3584 len 40960 Jun 25 17:05:16 skull kernel: [ 160.560794] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:16 skull kernel: [ 160.563253] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:16 skull kernel: [ 160.563254] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:16 skull kernel: [ 160.563255] 0: pfn 4835945 offset 0 len 32256 Jun 25 17:05:16 skull kernel: [ 160.563256] 1: pfn 4835952 offset 3584 len 40960 Jun 25 17:05:16 skull kernel: [ 160.563258] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:16 skull kernel: [ 160.565808] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:16 skull kernel: [ 160.565810] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:16 skull kernel: [ 160.565812] 0: pfn 4836329 offset 0 len 32256 Jun 25 17:05:16 skull kernel: [ 160.565813] 1: pfn 4836336 offset 3584 len 40960 Jun 25 17:05:16 skull kernel: [ 160.565816] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:21 skull kernel: [ 165.588911] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:21 skull kernel: [ 165.588913] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:21 skull kernel: [ 165.588914] 0: pfn 4835399 offset 0 len 32256 Jun 25 17:05:21 skull kernel: [ 165.588915] 1: pfn 4835406 offset 3584 len 40960 Jun 25 17:05:21 skull kernel: [ 165.588917] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:21 skull kernel: [ 165.591515] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:21 skull kernel: [ 165.591516] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:21 skull kernel: [ 165.591517] 0: pfn 4835463 offset 0 len 32256 Jun 25 17:05:21 skull kernel: [ 165.591518] 1: pfn 4835470 offset 3584 len 40960 Jun 25 17:05:21 skull kernel: [ 165.591521] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000 Jun 25 17:05:21 skull kernel: [ 165.596861] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:21 skull kernel: [ 165.596863] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:21 skull kernel: [ 165.596864] 0: pfn 4835945 offset 0 len 32256 Jun 25 17:05:21 skull kernel: [ 165.596865] 1: pfn 4835952 offset 3584 len 40960 Jun 25 17:05:21 skull kernel: [ 165.596867] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:21 skull kernel: [ 165.618673] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:21 skull kernel: [ 165.618675] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:21 skull kernel: [ 165.618677] 0: pfn 4835431 offset 0 len 32256 Jun 25 17:05:21 skull kernel: [ 165.618678] 1: pfn 4835438 offset 3584 len 40960 Jun 25 17:05:21 skull kernel: [ 165.618681] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:21 skull kernel: [ 165.620969] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:21 skull kernel: [ 165.620971] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:21 skull kernel: [ 165.620972] 0: pfn 4834601 offset 0 len 32256 Jun 25 17:05:21 skull kernel: [ 165.620973] 1: pfn 4834608 offset 3584 len 40960 Jun 25 17:05:21 skull kernel: [ 165.620975] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000 Jun 25 17:05:21 skull kernel: [ 165.623245] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:21 skull kernel: [ 165.623246] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:21 skull kernel: [ 165.623248] 0: pfn 4834377 offset 0 len 32256 Jun 25 17:05:21 skull kernel: [ 165.623249] 1: pfn 4834384 offset 3584 len 40960 Jun 25 17:05:21 skull kernel: [ 165.623251] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:21 skull kernel: [ 165.625474] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:21 skull kernel: [ 165.625475] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:21 skull kernel: [ 165.625477] 0: pfn 4835431 offset 0 len 32256 Jun 25 17:05:21 skull kernel: [ 165.625478] 1: pfn 4835438 offset 3584 len 40960 Jun 25 17:05:21 skull kernel: [ 165.625480] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000 Jun 25 17:05:26 skull kernel: [ 170.647608] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:26 skull kernel: [ 170.647610] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:26 skull kernel: [ 170.647611] 0: pfn 4835945 offset 0 len 32256 Jun 25 17:05:26 skull kernel: [ 170.647613] 1: pfn 4835952 offset 3584 len 40960 Jun 25 17:05:26 skull kernel: [ 170.647615] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:26 skull kernel: [ 170.650245] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:26 skull kernel: [ 170.650247] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:26 skull kernel: [ 170.650248] 0: pfn 4834377 offset 0 len 32256 Jun 25 17:05:26 skull kernel: [ 170.650249] 1: pfn 4834384 offset 3584 len 40960 Jun 25 17:05:26 skull kernel: [ 170.650252] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000 Jun 25 17:05:26 skull kernel: [ 170.652436] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:26 skull kernel: [ 170.652437] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:26 skull kernel: [ 170.652438] 0: pfn 4835463 offset 0 len 32256 Jun 25 17:05:26 skull kernel: [ 170.652439] 1: pfn 4835470 offset 3584 len 40960 Jun 25 17:05:26 skull kernel: [ 170.652441] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:26 skull kernel: [ 170.655278] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:26 skull kernel: [ 170.655280] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:26 skull kernel: [ 170.655281] 0: pfn 4836329 offset 0 len 32256 Jun 25 17:05:26 skull kernel: [ 170.655282] 1: pfn 4836336 offset 3584 len 40960 Jun 25 17:05:26 skull kernel: [ 170.655284] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000 Jun 25 17:05:26 skull kernel: [ 170.678838] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:26 skull kernel: [ 170.678839] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:26 skull kernel: [ 170.678840] 0: pfn 4834601 offset 0 len 32256 Jun 25 17:05:26 skull kernel: [ 170.678841] 1: pfn 4834608 offset 3584 len 40960 Jun 25 17:05:26 skull kernel: [ 170.678843] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:26 skull kernel: [ 170.681377] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:26 skull kernel: [ 170.681377] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:26 skull kernel: [ 170.681378] 0: pfn 4834601 offset 0 len 32256 Jun 25 17:05:26 skull kernel: [ 170.681379] 1: pfn 4834608 offset 3584 len 40960 Jun 25 17:05:26 skull kernel: [ 170.681381] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000 Jun 25 17:05:26 skull kernel: [ 170.683644] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:26 skull kernel: [ 170.683645] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:26 skull kernel: [ 170.683646] 0: pfn 4834601 offset 0 len 32256 Jun 25 17:05:26 skull kernel: [ 170.683647] 1: pfn 4834608 offset 3584 len 40960 Jun 25 17:05:26 skull kernel: [ 170.683648] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:26 skull kernel: [ 170.686057] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:26 skull kernel: [ 170.686058] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:26 skull kernel: [ 170.686059] 0: pfn 4835463 offset 0 len 32256 Jun 25 17:05:26 skull kernel: [ 170.686059] 1: pfn 4835470 offset 3584 len 40960 Jun 25 17:05:26 skull kernel: [ 170.686061] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000 Jun 25 17:05:31 skull kernel: [ 175.693443] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:31 skull kernel: [ 175.693445] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:31 skull kernel: [ 175.693446] 0: pfn 4834601 offset 0 len 32256 Jun 25 17:05:31 skull kernel: [ 175.693448] 1: pfn 4834608 offset 3584 len 40960 Jun 25 17:05:31 skull kernel: [ 175.693451] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 Jun 25 17:05:31 skull kernel: [ 175.698118] bio 0: 7602/0 1 2105917377 32256 Jun 25 17:05:31 skull kernel: [ 175.698120] bio 1: 6602/0 1 2105917440 40960 Jun 25 17:05:31 skull kernel: [ 175.698121] 0: pfn 4834569 offset 0 len 32256 Jun 25 17:05:31 skull kernel: [ 175.698122] 1: pfn 4834576 offset 3584 len 40960 Jun 25 17:05:31 skull kernel: [ 175.698124] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0 -- mit freundlichen Gr?ssen Wolfgang Rohdewald ^ permalink raw reply [flat|nested] 11+ messages in thread
* [nvme] Invalid SGL for payload:91648 nents:3 2019-06-25 15:07 ` Wolfgang Rohdewald @ 2019-06-26 1:26 ` Ming Lei 2019-06-26 20:18 ` Wolfgang Rohdewald 2019-07-04 12:09 ` Wolfgang Rohdewald 0 siblings, 2 replies; 11+ messages in thread From: Ming Lei @ 2019-06-26 1:26 UTC (permalink / raw) On Tue, Jun 25, 2019@05:07:30PM +0200, Wolfgang Rohdewald wrote: > On Di, 2019-06-25@21:29 +0800, Ming Lei wrote: > > On Tue, Jun 25, 2019@02:11:04PM +0200, Wolfgang Rohdewald wrote: > > > On Di, 2019-06-25@11:50 +0200, Christoph Hellwig wrote: > > > > On Tue, Jun 25, 2019@05:45:35PM +0800, Ming Lei wrote: > > > > > On Tue, Jun 25, 2019@06:38:58PM +0900, Keith Busch wrote: > > > > > > The first two sg elements should have been merged to one and would have > > > > > > avoided the error. The bug is kernel, nothing to do with the device. > > > > > > > > > > The default max segment size is 64k, so the first two can't be merged. > > > > > > > > > > And the 1st sg should have been dispatched as single request. > > > > > > > > Well, that is exactly what I fixed with my patches that went into > > > > 5.2. Looks like we need to backport those as well. > > > > > > > > Wolfgang, can you try the latest 5.2-rc git tree? > > > > > > 5.2.0-rc6 works fine. > > > > > > Do you still want me to apply nvme_dump_rq() to 5.1 for finding the root cause? > > > > Yeah, please dump the request, and maybe there is issue in merge code. > > BTW, please replace the trace_printk with printk in the debug patch. > > > > Also not sure if Christoph's patch can be backported safely, given there > > is report wrt. too big max segment size recently. > > > > Thanks, > > Ming > > > Jun 25 17:05:05 skull kernel: [ 150.419778] sg[0] phys_addr:0x000000049cbe9000 offset:0 length:32256 dma_address:0x000000049cbe9000 dma_length:32256 > Jun 25 17:05:05 skull kernel: [ 150.419780] sg[1] phys_addr:0x000000049cbf0e00 offset:3584 length:40960 dma_address:0x000000049cbf0e00 dma_length:40960 > Jun 25 17:05:05 skull kernel: [ 150.419783] ------------[ cut here ]------------ > Jun 25 17:05:05 skull kernel: [ 150.419784] Invalid SGL for payload:73216 nents:2 > Jun 25 17:05:05 skull kernel: [ 150.419820] WARNING: CPU: 2 PID: 3566 at drivers/nvme/host/pci.c:771 nvme_pci_setup_prps.isra.48+0x372/0x3d0 [nvme] > Jun 25 17:05:05 skull kernel: [ 150.419821] Modules linked in: ccm rfcomm ipt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter xt_conntrack nf_nat > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc rpcsec_gss_krb5 auth_rpcgss pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfsv4 nfs lockd grace > fscache dummy nfnetlink_queue nfnetlink_log nfnetlink cmac bnep btusb btrtl btbcm btintel bluetooth joydev input_leds ecdh_generic overlay binfmt_misc nls_iso8859_1 arc4 snd_hda_codec_hdmi intel_rapl > x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec crct10dif_pclmul snd_hda_core crc32_pclmul snd_hwdep snd_pcm ghash_clmulni_intel > snd_seq_midi snd_seq_midi_event iwlmvm snd_rawmidi mac80211 iwlwifi snd_seq aesni_intel aes_x86_64 crypto_simd cryptd glue_helper snd_seq_device snd_timer intel_cstate intel_rapl_perf snd cfg80211 > soundcore mei_me intel_pch_thermal mei ir_rc6_decoder wmi > Jun 25 17:05:05 skull kernel: [ 150.419845] rc_rc6_mce nuvoton_cir rc_core acpi_pad squashfs sch_fq_codel sunrpc nct6775 hwmon_vid coretemp ip_tables x_tables autofs4 hid_logitech_hidpp > hid_logitech_dj hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvme sdhci_pci drm e1000e cqhci i2c_i801 nvme_core sdhci video > Jun 25 17:05:05 skull kernel: [ 150.419857] CPU: 2 PID: 3566 Comm: AioMgr0-N Tainted: G OE 5.1.15- #1 > Jun 25 17:05:05 skull kernel: [ 150.419857] Hardware name: /NUC6i7KYB, BIOS KYSKLi70.86A.0062.2019.0507.1702 05/07/2019 > Jun 25 17:05:05 skull kernel: [ 150.419860] RIP: 0010:nvme_pci_setup_prps.isra.48+0x372/0x3d0 [nvme] > Jun 25 17:05:05 skull kernel: [ 150.419861] Code: 48 c7 c7 c0 85 0b c0 e8 2c 08 fb da 41 8b 97 4c 01 00 00 41 f6 47 1e 04 75 57 41 8b 77 24 48 c7 c7 38 4a 0b c0 e8 78 fb bb da <0f> 0b e9 28 ff ff ff > b8 0a 00 00 00 e9 f3 fe ff ff 4c 8b 7d 98 4d > Jun 25 17:05:05 skull kernel: [ 150.419863] RSP: 0018:ffffa39186847558 EFLAGS: 00010282 > Jun 25 17:05:05 skull kernel: [ 150.419864] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006 > Jun 25 17:05:05 skull kernel: [ 150.419865] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff95d61ea954f0 > Jun 25 17:05:05 skull kernel: [ 150.419865] RBP: ffffa391868475e8 R08: 000000000000066d R09: 0000000000000004 > Jun 25 17:05:05 skull kernel: [ 150.419866] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002 > Jun 25 17:05:05 skull kernel: [ 150.419867] R13: 0000000000000002 R14: 0000000000000007 R15: ffff95d60feed000 > Jun 25 17:05:05 skull kernel: [ 150.419868] FS: 00007ff4d6168700(0000) GS:ffff95d61ea80000(0000) knlGS:0000000000000000 > Jun 25 17:05:05 skull kernel: [ 150.419869] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Jun 25 17:05:05 skull kernel: [ 150.419870] CR2: ffffe20ea05ec2e8 CR3: 0000000500d24004 CR4: 00000000003626e0 > Jun 25 17:05:05 skull kernel: [ 150.419870] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Jun 25 17:05:05 skull kernel: [ 150.419871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Jun 25 17:05:05 skull kernel: [ 150.419872] Call Trace: > Jun 25 17:05:05 skull kernel: [ 150.419878] nvme_queue_rq+0x566/0x710 [nvme] > Jun 25 17:05:05 skull kernel: [ 150.419883] __blk_mq_try_issue_directly+0x114/0x1c0 > Jun 25 17:05:05 skull kernel: [ 150.419886] ? ktime_get_raw_ts64+0x30/0xc0 > Jun 25 17:05:05 skull kernel: [ 150.419888] blk_mq_request_issue_directly+0x48/0xe0 > Jun 25 17:05:05 skull kernel: [ 150.419890] blk_mq_try_issue_list_directly+0x41/0xb0 > Jun 25 17:05:05 skull kernel: [ 150.419893] blk_mq_sched_insert_requests+0xa5/0xf0 > Jun 25 17:05:05 skull kernel: [ 150.419895] blk_mq_flush_plug_list+0x211/0x2c0 > Jun 25 17:05:05 skull kernel: [ 150.419898] blk_flush_plug_list+0xc6/0xf0 > Jun 25 17:05:05 skull kernel: [ 150.419900] blk_mq_make_request+0x208/0x580 > Jun 25 17:05:05 skull kernel: [ 150.419902] generic_make_request+0x19e/0x400 > Jun 25 17:05:05 skull kernel: [ 150.419904] submit_bio+0x45/0x140 > Jun 25 17:05:05 skull kernel: [ 150.419908] do_blockdev_direct_IO+0x21e4/0x2e30 > Jun 25 17:05:05 skull kernel: [ 150.419912] ? find_get_pages_range_tag+0x81/0x2d0 > Jun 25 17:05:05 skull kernel: [ 150.419915] ? ext4_dio_get_block_unwritten_sync+0x50/0x50 > Jun 25 17:05:05 skull kernel: [ 150.419917] ext4_direct_IO+0x359/0x740 > Jun 25 17:05:05 skull kernel: [ 150.419920] ? touch_atime+0x33/0xe0 > Jun 25 17:05:05 skull kernel: [ 150.419923] generic_file_read_iter+0x9c/0xc90 > Jun 25 17:05:05 skull kernel: [ 150.419925] aio_read+0xec/0x160 > Jun 25 17:05:05 skull kernel: [ 150.419929] ? tomoyo_init_request_info+0x84/0x90 > Jun 25 17:05:05 skull kernel: [ 150.419930] ? tomoyo_path_number_perm+0x68/0x1e0 > Jun 25 17:05:05 skull kernel: [ 150.419933] ? _cond_resched+0x15/0x30 > Jun 25 17:05:05 skull kernel: [ 150.419934] ? __kmalloc+0x166/0x200 > Jun 25 17:05:05 skull kernel: [ 150.419949] ? rtR0MemAllocEx+0x14c/0x240 [vboxdrv] > Jun 25 17:05:05 skull kernel: [ 150.419958] ? supdrvIOCtl+0xcc1/0x36e0 [vboxdrv] > Jun 25 17:05:05 skull kernel: [ 150.419959] ? kmem_cache_alloc+0x150/0x1c0 > Jun 25 17:05:05 skull kernel: [ 150.419961] io_submit_one+0x171/0xb20 > Jun 25 17:05:05 skull kernel: [ 150.419967] ? VBoxDrvLinuxIOCtl_6_0_6+0x188/0x240 [vboxdrv] > Jun 25 17:05:05 skull kernel: [ 150.419969] ? __x64_sys_io_submit+0xa9/0x190 > Jun 25 17:05:05 skull kernel: [ 150.419971] ? lookup_ioctx+0x1d/0x90 > Jun 25 17:05:05 skull kernel: [ 150.419973] __x64_sys_io_submit+0xa9/0x190 > Jun 25 17:05:05 skull kernel: [ 150.419975] do_syscall_64+0x5f/0x190 > Jun 25 17:05:05 skull kernel: [ 150.419978] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > Jun 25 17:05:05 skull kernel: [ 150.419980] RIP: 0033:0x7ff539ace2e9 > Jun 25 17:05:05 skull kernel: [ 150.419981] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 > 01 c3 48 8b 0d 77 cb 0c 00 f7 d8 64 89 01 48 > Jun 25 17:05:05 skull kernel: [ 150.419982] RSP: 002b:00007ff4d6167b98 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 > Jun 25 17:05:05 skull kernel: [ 150.419984] RAX: ffffffffffffffda RBX: 00007ff4b801c9b0 RCX: 00007ff539ace2e9 > Jun 25 17:05:05 skull kernel: [ 150.419984] RDX: 00007ff4d6167c60 RSI: 0000000000000005 RDI: 00007ff50802c000 > Jun 25 17:05:05 skull kernel: [ 150.419985] RBP: 00007ff4d6167bc0 R08: 0000000000000001 R09: 00007ff4d6167c70 > Jun 25 17:05:05 skull kernel: [ 150.419986] R10: 00007ff4b801b960 R11: 0000000000000246 R12: 00007ff4d6167c60 > Jun 25 17:05:05 skull kernel: [ 150.419986] R13: 0000000000000005 R14: 00007ff4e0940100 R15: 0000000000000005 > Jun 25 17:05:05 skull kernel: [ 150.419988] ---[ end trace 581bcb8c21b397f6 ]--- > Jun 25 17:05:05 skull kernel: [ 150.419994] bio 0: 7602/0 1 2105917377 32256 > Jun 25 17:05:05 skull kernel: [ 150.420017] bio 1: 6602/0 1 2105917440 40960 > Jun 25 17:05:05 skull kernel: [ 150.420022] 0: pfn 4836329 offset 0 len 32256 > Jun 25 17:05:05 skull kernel: [ 150.420027] 1: pfn 4836336 offset 3584 len 40960 I can understand the issue now, and the issue isn't related with bio split, and it is in the normal bio merge code: biovec_phys_mergeable() which merges two bios if their bvecs are physically continuous, and not consider max segment size limit, which is still applied in blk_bvec_map_sg(), then sg with unaligned virt boundary is made. Christoph's patch 09324d32d2a08 ("block: force an unlimited segment size on queues with a virt boundary") can fix the issue, but another regression might be caused because of too big segment size from view of IOMMU. And the following patch should fix the 5.1 issue too: diff --git a/block/blk-merge.c b/block/blk-merge.c index 1c9d4f0f96ea..ff69b7dd2179 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -60,7 +60,8 @@ static inline bool bio_will_gap(struct request_queue *q, */ bio_get_last_bvec(prev, &pb); bio_get_first_bvec(next, &nb); - if (bios_segs_mergeable(q, prev, &pb, &nb)) + if (bios_segs_mergeable(q, prev, &pb, &nb) && prev->bi_seg_back_size + + next->bi_seg_front_size < queue_max_segment_size(q)) return false; return __bvec_gap_to_prev(q, &pb, nb.bv_offset); } Thanks, Ming ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [nvme] Invalid SGL for payload:91648 nents:3 2019-06-26 1:26 ` Ming Lei @ 2019-06-26 20:18 ` Wolfgang Rohdewald 2019-07-04 12:09 ` Wolfgang Rohdewald 1 sibling, 0 replies; 11+ messages in thread From: Wolfgang Rohdewald @ 2019-06-26 20:18 UTC (permalink / raw) On Mi, 2019-06-26@09:26 +0800, Ming Lei wrote: > And the following patch should fix the 5.1 issue too: > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 1c9d4f0f96ea..ff69b7dd2179 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -60,7 +60,8 @@ static inline bool bio_will_gap(struct request_queue *q, > */ > bio_get_last_bvec(prev, &pb); > bio_get_first_bvec(next, &nb); > - if (bios_segs_mergeable(q, prev, &pb, &nb)) > + if (bios_segs_mergeable(q, prev, &pb, &nb) && prev->bi_seg_back_size + > + next->bi_seg_front_size < queue_max_segment_size(q)) > return false; > return __bvec_gap_to_prev(q, &pb, nb.bv_offset); > } So far, this works for me with 5.1.15. Before, the bug would trigger within seconds after starting windows 10 update in virtualbox 6.0.6, now it does not. -- mit freundlichen Gr?ssen Wolfgang Rohdewald ^ permalink raw reply [flat|nested] 11+ messages in thread
* [nvme] Invalid SGL for payload:91648 nents:3 2019-06-26 1:26 ` Ming Lei 2019-06-26 20:18 ` Wolfgang Rohdewald @ 2019-07-04 12:09 ` Wolfgang Rohdewald 1 sibling, 0 replies; 11+ messages in thread From: Wolfgang Rohdewald @ 2019-07-04 12:09 UTC (permalink / raw) On Mi, 2019-06-26@09:26 +0800, Ming Lei wrote: > I can understand the issue now, and the issue isn't related with bio > split, and it is in the normal bio merge code: biovec_phys_mergeable() > which merges two bios if their bvecs are physically continuous, and not consider > max segment size limit, which is still applied in blk_bvec_map_sg(), > then sg with unaligned virt boundary is made. > > Christoph's patch 09324d32d2a08 ("block: force an unlimited segment size on queues > with a virt boundary") can fix the issue, but another regression might be > caused because of too big segment size from view of IOMMU. > > And the following patch should fix the 5.1 issue too: > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 1c9d4f0f96ea..ff69b7dd2179 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -60,7 +60,8 @@ static inline bool bio_will_gap(struct request_queue *q, > */ > bio_get_last_bvec(prev, &pb); > bio_get_first_bvec(next, &nb); > - if (bios_segs_mergeable(q, prev, &pb, &nb)) > + if (bios_segs_mergeable(q, prev, &pb, &nb) && prev->bi_seg_back_size + > + next->bi_seg_front_size < queue_max_segment_size(q)) > return false; > return __bvec_gap_to_prev(q, &pb, nb.bv_offset); > } I wonder why this is not in 5.1.16 -- mit freundlichen Gr?ssen Wolfgang Rohdewald ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-07-04 12:09 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-06-25 7:51 [nvme] Invalid SGL for payload:91648 nents:3 Wolfgang Rohdewald 2019-06-25 9:17 ` Ming Lei [not found] ` <CAOSXXT5PgPJqouD0G7qJjar5tmy-9z+wWdFMqpD2K1=5vQZ+HA@mail.gmail.com> 2019-06-25 9:45 ` Ming Lei 2019-06-25 9:50 ` Christoph Hellwig 2019-06-25 10:07 ` Ming Lei 2019-06-25 12:11 ` Wolfgang Rohdewald 2019-06-25 13:29 ` Ming Lei 2019-06-25 15:07 ` Wolfgang Rohdewald 2019-06-26 1:26 ` Ming Lei 2019-06-26 20:18 ` Wolfgang Rohdewald 2019-07-04 12:09 ` Wolfgang Rohdewald
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.