All of lore.kernel.org
 help / color / mirror / Atom feed
* [nvme] Invalid SGL for payload:91648 nents:3
@ 2019-06-25  7:51 Wolfgang Rohdewald
  2019-06-25  9:17 ` Ming Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Wolfgang Rohdewald @ 2019-06-25  7:51 UTC (permalink / raw)


For some time now, I am getting this on an Intel SSD M.2, output of smartctl see below.
This also happens with sgl_threshold=0.

Untainted kernel 5.1.14 (happens also with 5.1.8 and 5.1.15) on Ubuntu 19.04

I cannot say exactly when this started. I cannot trigger it with
cp -a but after starting Virtualbox 6.0.6_Ubuntu r129722 with a Windows 10 client,
it happens quickly. The first logged case is with Linux 5.1.11 but I did
not use Virtualbox for some time before, so if that is what it triggers,
the bug might be older.

I have a Samsung SSD 950 PRO 512GB for the root partition, the Intel SSD
holds a separate ext4 file system. According to smartctl, the Samsung
has no problems. Using Virtualbox there also gives me unexplainable
print_req_error messages but nothing else.

After this happens, smartctl -a often says

Read Error Information Log failed: NVMe Status 0x02

Otherwise, smartctl finds no problems with the SSD, fsck -f is clean,
fsck -c finds no bad blocks.

The sector numbers in the print_req_error message vary, see below.

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.1.15-364] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPEKNW020T8
Serial Number:                      BTNH903409YJ2P0C
Firmware Version:                   002C
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2.048.408.248.320 [2,04 TB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Tue Jun 25 08:57:41 2019 CEST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     77 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.50W       -        -    0  0  0  0        0       0
 1 +     3.60W       -        -    1  1  1  1        0       0
 2 +     2.60W       -        -    2  2  2  2        0       0
 3 -   0.0300W       -        -    3  3  3  3     5000    5000
 4 -   0.0040W       -        -    4  4  4  4     5000    9000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning:                   0x00
Temperature:                        32 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    9.228.998 [4,72 TB]
Data Units Written:                 3.245.879 [1,66 TB]
Host Read Commands:                 41.518.159
Host Write Commands:                14.046.787
Controller Busy Time:               365
Power Cycles:                       40
Power On Hours:                     1.239
Unsafe Shutdowns:                   11
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Read Error Information Log failed: NVMe Status 0x02


Jun 23 07:49:20 skull kernel: [  171.583390] vboxdrv: 0000000000000000 VBoxEhciR0.r0
Jun 23 07:52:35 skull kernel: [  366.611276] sg[0] phys_addr:0x0000000372be9000 offset:0 length:60416 dma_address:0x0000000372be9000 dma_length:60416
Jun 23 07:52:35 skull kernel: [  366.611277] sg[1] phys_addr:0x0000000372bf7c00 offset:3072 length:33792 dma_address:0x0000000372bf7c00 dma_length:33792
Jun 23 07:52:35 skull kernel: [  366.611278] sg[2] phys_addr:0x0000000372929000 offset:0 length:36864 dma_address:0x0000000372929000 dma_length:36864
Jun 23 07:52:35 skull kernel: [  366.611281] ------------[ cut here ]------------
Jun 23 07:52:35 skull kernel: [  366.611282] Invalid SGL for payload:131072 nents:3
Jun 23 07:52:35 skull kernel: [  366.611294] WARNING: CPU: 2 PID: 7675 at drivers/nvme/host/pci.c:748 nvme_queue_rq+0x9ed/0xa40
Jun 23 07:52:35 skull kernel: [  366.611294] Modules linked in: ccm rfcomm ipt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter xt_conntrack nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc rpcsec_gss_krb5 auth_rpcgss pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfsv4 nfs lockd grace
fscache dummy nfnetlink_queue nfnetlink_log nfnetlink cmac bnep input_leds joydev btusb btrtl overlay btbcm btintel bluetooth ecdh_generic binfmt_misc nls_iso8859_1 arc4 snd_hda_codec_hdmi
snd_hda_codec_realtek intel_rapl x86_pkg_temp_thermal snd_hda_codec_generic intel_powerclamp snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep crct10dif_pclmul snd_pcm crc32_pclmul
ghash_clmulni_intel snd_seq_midi snd_seq_midi_event iwlmvm snd_rawmidi aesni_intel mac80211 aes_x86_64 crypto_simd cryptd glue_helper intel_cstate intel_rapl_perf snd_seq iwlwifi snd_seq_device
snd_timer snd cfg80211 soundcore mei_me mei intel_pch_thermal ir_rc6_decoder wmi
Jun 23 07:52:35 skull kernel: [  366.611322]  rc_rc6_mce nuvoton_cir rc_core acpi_pad squashfs sch_fq_codel nct6775 hwmon_vid coretemp sunrpc ip_tables x_tables autofs4 hid_logitech_hidpp
hid_logitech_dj hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops e1000e drm sdhci_pci cqhci i2c_i801 sdhci video
Jun 23 07:52:35 skull kernel: [  366.611333] CPU: 2 PID: 7675 Comm: AioMgr0-N Tainted: G           OE     5.1.14-363 #1
Jun 23 07:52:35 skull kernel: [  366.611334] Hardware name:  /NUC6i7KYB, BIOS KYSKLi70.86A.0062.2019.0507.1702 05/07/2019
Jun 23 07:52:35 skull kernel: [  366.611336] RIP: 0010:nvme_queue_rq+0x9ed/0xa40
Jun 23 07:52:35 skull kernel: [  366.611338] Code: 48 c7 c7 38 6a a3 b3 e8 a1 f0 e1 ff 41 8b 97 4c 01 00 00 41 f6 47 1e 04 75 57 41 8b 77 24 48 c7 c7 80 7e f2 b2 e8 bd e4 a2 ff <0f> 0b bd 0a 00 00 00
e9 fb fd ff ff 49 89 df bd 09 00 00 00 e9 ee
Jun 23 07:52:35 skull kernel: [  366.611338] RSP: 0018:ffffa396c8fb75c8 EFLAGS: 00010282
Jun 23 07:52:35 skull kernel: [  366.611340] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000006
Jun 23 07:52:35 skull kernel: [  366.611340] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff97649ea954f0
Jun 23 07:52:35 skull kernel: [  366.611341] RBP: 0000000000000000 R08: 000000000000067c R09: 0000000000000004
Jun 23 07:52:35 skull kernel: [  366.611342] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000003
Jun 23 07:52:35 skull kernel: [  366.611342] R13: ffff97649aa28000 R14: 0000000000011000 R15: ffff976491a7ac00
Jun 23 07:52:35 skull kernel: [  366.611344] FS:  00007f148b0c8700(0000) GS:ffff97649ea80000(0000) knlGS:0000000000000000
Jun 23 07:52:35 skull kernel: [  366.611344] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 23 07:52:35 skull kernel: [  366.611345] CR2: 00007ffb890206c4 CR3: 00000005839f8005 CR4: 00000000003626e0
Jun 23 07:52:35 skull kernel: [  366.611346] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 23 07:52:35 skull kernel: [  366.611347] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 23 07:52:35 skull kernel: [  366.611347] Call Trace:
Jun 23 07:52:35 skull kernel: [  366.611354]  __blk_mq_try_issue_directly+0x114/0x1c0
Jun 23 07:52:35 skull kernel: [  366.611357]  blk_mq_request_issue_directly+0x48/0xe0
Jun 23 07:52:35 skull kernel: [  366.611359]  blk_mq_try_issue_list_directly+0x41/0xb0
Jun 23 07:52:35 skull kernel: [  366.611361]  blk_mq_sched_insert_requests+0xa5/0xf0
Jun 23 07:52:35 skull kernel: [  366.611363]  blk_mq_flush_plug_list+0x211/0x2c0
Jun 23 07:52:35 skull kernel: [  366.611366]  blk_flush_plug_list+0xc6/0xf0
Jun 23 07:52:35 skull kernel: [  366.611368]  blk_mq_make_request+0x208/0x580
Jun 23 07:52:35 skull kernel: [  366.611370]  generic_make_request+0x19e/0x400
Jun 23 07:52:35 skull kernel: [  366.611373]  submit_bio+0x45/0x140
Jun 23 07:52:35 skull kernel: [  366.611376]  do_blockdev_direct_IO+0x21e4/0x2e30
Jun 23 07:52:35 skull kernel: [  366.611380]  ? find_get_pages_range_tag+0x81/0x2d0
Jun 23 07:52:35 skull kernel: [  366.611384]  ? ext4_dio_get_block_unwritten_sync+0x50/0x50
Jun 23 07:52:35 skull kernel: [  366.611386]  ext4_direct_IO+0x359/0x740
Jun 23 07:52:35 skull kernel: [  366.611388]  ? touch_atime+0x33/0xe0
Jun 23 07:52:35 skull kernel: [  366.611391]  generic_file_read_iter+0x9c/0xc90
Jun 23 07:52:35 skull kernel: [  366.611394]  aio_read+0xec/0x160
Jun 23 07:52:35 skull kernel: [  366.611397]  ? tomoyo_init_request_info+0x84/0x90
Jun 23 07:52:35 skull kernel: [  366.611399]  ? tomoyo_path_number_perm+0x68/0x1e0
Jun 23 07:52:35 skull kernel: [  366.611401]  ? _cond_resched+0x15/0x30
Jun 23 07:52:35 skull kernel: [  366.611403]  ? __kmalloc+0x166/0x200
Jun 23 07:52:35 skull kernel: [  366.611416]  ? rtR0MemAllocEx+0x14c/0x240 [vboxdrv]
Jun 23 07:52:35 skull kernel: [  366.611425]  ? supdrvIOCtl+0xcc1/0x36e0 [vboxdrv]
Jun 23 07:52:35 skull kernel: [  366.611426]  ? kmem_cache_alloc+0x150/0x1c0
Jun 23 07:52:35 skull kernel: [  366.611428]  io_submit_one+0x171/0xb20
Jun 23 07:52:35 skull kernel: [  366.611434]  ? VBoxDrvLinuxIOCtl_6_0_6+0x188/0x240 [vboxdrv]
Jun 23 07:52:35 skull kernel: [  366.611436]  ? __x64_sys_io_submit+0xa9/0x190
Jun 23 07:52:35 skull kernel: [  366.611441]  ? lookup_ioctx+0x1d/0x90
Jun 23 07:52:35 skull kernel: [  366.611442]  __x64_sys_io_submit+0xa9/0x190
Jun 23 07:52:35 skull kernel: [  366.611451]  do_syscall_64+0x5f/0x190
Jun 23 07:52:35 skull kernel: [  366.611453]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 23 07:52:35 skull kernel: [  366.611455] RIP: 0033:0x7f14ed70e2e9
Jun 23 07:52:35 skull kernel: [  366.611457] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73
01 c3 48 8b 0d 77 cb 0c 00 f7 d8 64 89 01 48
Jun 23 07:52:35 skull kernel: [  366.611457] RSP: 002b:00007f148b0c7b98 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
Jun 23 07:52:35 skull kernel: [  366.611459] RAX: ffffffffffffffda RBX: 00007f14948aaed0 RCX: 00007f14ed70e2e9
Jun 23 07:52:35 skull kernel: [  366.611459] RDX: 00007f148b0c7c60 RSI: 000000000000000a RDI: 00007f14d81ab000
Jun 23 07:52:35 skull kernel: [  366.611460] RBP: 00007f148b0c7bc0 R08: 0000000000000001 R09: 00007f148b0c7c70
Jun 23 07:52:35 skull kernel: [  366.611461] R10: 00007f149c00ea00 R11: 0000000000000246 R12: 00007f148b0c7c60
Jun 23 07:52:35 skull kernel: [  366.611461] R13: 000000000000000a R14: 00007f149493f5a0 R15: 000000000000000a
Jun 23 07:52:35 skull kernel: [  366.611463] ---[ end trace d2b93cc190ac7a41 ]---
Jun 23 07:52:35 skull kernel: [  366.611466] print_req_error: I/O error, dev nvme1n1, sector 741253002 flags 4000

grep -a sector /var/log/kern.log*
/var/log/kern.log:Jun 23 07:52:35 skull kernel: [  366.611466] print_req_error: I/O error, dev nvme1n1, sector 741253002 flags 4000
/var/log/kern.log:Jun 23 07:55:24 skull kernel: [  535.823599] print_req_error: I/O error, dev nvme1n1, sector 160612274 flags 4000
/var/log/kern.log:Jun 23 08:04:50 skull kernel: [   70.839418] print_req_error: I/O error, dev nvme1n1, sector 161154970 flags 0
/var/log/kern.log:Jun 23 08:05:52 skull kernel: [  132.746319] print_req_error: I/O error, dev nvme1n1, sector 161154970 flags 4000
/var/log/kern.log:Jun 23 08:05:52 skull kernel: [  132.832546] print_req_error: I/O error, dev nvme1n1, sector 161154970 flags 0
/var/log/kern.log:Jun 23 08:05:52 skull kernel: [  132.855611] print_req_error: I/O error, dev nvme1n1, sector 161154970 flags 4000
/var/log/kern.log:Jun 23 08:08:52 skull kernel: [  312.957573] print_req_error: I/O error, dev nvme1n1, sector 161150857 flags 0
/var/log/kern.log:Jun 23 08:08:52 skull kernel: [  312.958424] print_req_error: I/O error, dev nvme1n1, sector 161150857 flags 0
/var/log/kern.log:Jun 23 08:08:52 skull kernel: [  312.959489] print_req_error: I/O error, dev nvme1n1, sector 161150857 flags 0
/var/log/kern.log:Jun 24 18:57:11 skull kernel: [ 3576.127767] print_req_error: I/O error, dev nvme1n1, sector 790335378 flags 0
/var/log/kern.log:Jun 24 18:57:11 skull kernel: [ 3576.130135] print_req_error: I/O error, dev nvme1n1, sector 790335378 flags 4000
/var/log/kern.log:Jun 24 18:57:11 skull kernel: [ 3576.132458] print_req_error: I/O error, dev nvme1n1, sector 790335378 flags 0
/var/log/kern.log:Jun 24 18:57:11 skull kernel: [ 3576.134669] print_req_error: I/O error, dev nvme1n1, sector 790335378 flags 4000
/var/log/kern.log:Jun 24 18:58:40 skull kernel: [ 3665.022567] print_req_error: I/O error, dev nvme1n1, sector 740323209 flags 0
/var/log/kern.log:Jun 24 18:58:40 skull kernel: [ 3665.023451] print_req_error: I/O error, dev nvme1n1, sector 740323209 flags 0
/var/log/kern.log:Jun 24 18:58:40 skull kernel: [ 3665.024407] print_req_error: I/O error, dev nvme1n1, sector 740323209 flags 0
/var/log/kern.log:Jun 24 18:58:40 skull kernel: [ 3665.037217] print_req_error: I/O error, dev nvme1n1, sector 740323209 flags 0
/var/log/kern.log:Jun 24 18:58:40 skull kernel: [ 3665.038041] print_req_error: I/O error, dev nvme1n1, sector 740323209 flags 0
/var/log/kern.log:Jun 24 18:58:40 skull kernel: [ 3665.038891] print_req_error: I/O error, dev nvme1n1, sector 740323209 flags 0
/var/log/kern.log:Jun 24 18:58:40 skull kernel: [ 3665.040467] print_req_error: I/O error, dev nvme1n1, sector 740323209 flags 0
/var/log/kern.log:Jun 24 20:18:56 skull kernel: [ 8481.617887] print_req_error: I/O error, dev nvme1n1, sector 1805959090 flags 8801
/var/log/kern.log:Jun 24 22:07:18 skull kernel: [ 5407.282088] print_req_error: I/O error, dev nvme1n1, sector 1988491146 flags 4000
/var/log/kern.log:Jun 25 07:23:32 skull kernel: [  111.452844] print_req_error: I/O error, dev nvme1n1, sector 1988980610 flags 0
/var/log/kern.log:Jun 25 07:23:32 skull kernel: [  111.455658] print_req_error: I/O error, dev nvme1n1, sector 1988980610 flags 4000
/var/log/kern.log:Jun 25 07:23:32 skull kernel: [  111.458181] print_req_error: I/O error, dev nvme1n1, sector 1988980610 flags 0
/var/log/kern.log:Jun 25 07:23:32 skull kernel: [  111.460729] print_req_error: I/O error, dev nvme1n1, sector 1988980610 flags 4000
/var/log/kern.log:Jun 25 08:33:22 skull kernel: [  659.895032] print_req_error: I/O error, dev nvme1n1, sector 739135402 flags 0
/var/log/kern.log:Jun 25 08:33:22 skull kernel: [  659.904044] print_req_error: I/O error, dev nvme1n1, sector 739135402 flags 0
/var/log/kern.log:Jun 25 08:33:22 skull kernel: [  659.919253] print_req_error: I/O error, dev nvme1n1, sector 739135402 flags 4000
/var/log/kern.log:Jun 25 08:43:06 skull kernel: [  108.723789] print_req_error: I/O error, dev nvme1n1, sector 1781874554 flags 0
/var/log/kern.log.1:Jun 19 18:35:10 skull kernel: [ 1209.920642] print_req_error: I/O error, dev nvme1n1, sector 313347962 flags 4000



-- 
mit freundlichen Gr?ssen

Wolfgang Rohdewald

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [nvme] Invalid SGL for payload:91648 nents:3
  2019-06-25  7:51 [nvme] Invalid SGL for payload:91648 nents:3 Wolfgang Rohdewald
@ 2019-06-25  9:17 ` Ming Lei
       [not found]   ` <CAOSXXT5PgPJqouD0G7qJjar5tmy-9z+wWdFMqpD2K1=5vQZ+HA@mail.gmail.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2019-06-25  9:17 UTC (permalink / raw)


Hi Wolfgang,

On Tue, Jun 25, 2019@09:51:04AM +0200, Wolfgang Rohdewald wrote:
> For some time now, I am getting this on an Intel SSD M.2, output of smartctl see below.
> This also happens with sgl_threshold=0.
> 
> Untainted kernel 5.1.14 (happens also with 5.1.8 and 5.1.15) on Ubuntu 19.04
> 
> I cannot say exactly when this started. I cannot trigger it with
> cp -a but after starting Virtualbox 6.0.6_Ubuntu r129722 with a Windows 10 client,
> it happens quickly. The first logged case is with Linux 5.1.11 but I did
> not use Virtualbox for some time before, so if that is what it triggers,
> the bug might be older.
> 
> I have a Samsung SSD 950 PRO 512GB for the root partition, the Intel SSD
> holds a separate ext4 file system. According to smartctl, the Samsung
> has no problems. Using Virtualbox there also gives me unexplainable
> print_req_error messages but nothing else.
> 
> After this happens, smartctl -a often says
> 
> Read Error Information Log failed: NVMe Status 0x02
> 
> Otherwise, smartctl finds no problems with the SSD, fsck -f is clean,
> fsck -c finds no bad blocks.
> 
> The sector numbers in the print_req_error message vary, see below.
> 
> smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.1.15-364] (local build)
> Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Number:                       INTEL SSDPEKNW020T8
> Serial Number:                      BTNH903409YJ2P0C
> Firmware Version:                   002C
> PCI Vendor/Subsystem ID:            0x8086
> IEEE OUI Identifier:                0x5cd2e4
> Controller ID:                      1
> Number of Namespaces:               1
> Namespace 1 Size/Capacity:          2.048.408.248.320 [2,04 TB]
> Namespace 1 Formatted LBA Size:     512
> Local Time is:                      Tue Jun 25 08:57:41 2019 CEST
> Firmware Updates (0x14):            2 Slots, no Reset required
> Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
> Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
> Maximum Data Transfer Size:         32 Pages
> Warning  Comp. Temp. Threshold:     77 Celsius
> Critical Comp. Temp. Threshold:     80 Celsius
> 
> Supported Power States
> St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
>  0 +     5.50W       -        -    0  0  0  0        0       0
>  1 +     3.60W       -        -    1  1  1  1        0       0
>  2 +     2.60W       -        -    2  2  2  2        0       0
>  3 -   0.0300W       -        -    3  3  3  3     5000    5000
>  4 -   0.0040W       -        -    4  4  4  4     5000    9000
> 
> Supported LBA Sizes (NSID 0x1)
> Id Fmt  Data  Metadt  Rel_Perf
>  0 +     512       0         0
> 
> === START OF SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SMART/Health Information (NVMe Log 0x02, NSID 0x1)
> Critical Warning:                   0x00
> Temperature:                        32 Celsius
> Available Spare:                    100%
> Available Spare Threshold:          10%
> Percentage Used:                    0%
> Data Units Read:                    9.228.998 [4,72 TB]
> Data Units Written:                 3.245.879 [1,66 TB]
> Host Read Commands:                 41.518.159
> Host Write Commands:                14.046.787
> Controller Busy Time:               365
> Power Cycles:                       40
> Power On Hours:                     1.239
> Unsafe Shutdowns:                   11
> Media and Data Integrity Errors:    0
> Error Information Log Entries:      0
> Warning  Comp. Temperature Time:    0
> Critical Comp. Temperature Time:    0
> 
> Read Error Information Log failed: NVMe Status 0x02
> 
> 
> Jun 23 07:49:20 skull kernel: [  171.583390] vboxdrv: 0000000000000000 VBoxEhciR0.r0
> Jun 23 07:52:35 skull kernel: [  366.611276] sg[0] phys_addr:0x0000000372be9000 offset:0 length:60416 dma_address:0x0000000372be9000 dma_length:60416
> Jun 23 07:52:35 skull kernel: [  366.611277] sg[1] phys_addr:0x0000000372bf7c00 offset:3072 length:33792 dma_address:0x0000000372bf7c00 dma_length:33792
> Jun 23 07:52:35 skull kernel: [  366.611278] sg[2] phys_addr:0x0000000372929000 offset:0 length:36864 dma_address:0x0000000372929000 dma_length:36864

The above sg supposes to not be in same IO request, given each one is unaligned
wrt. virt_boundary limit, and bvec_split_segs() did check the limit.


> Jun 23 07:52:35 skull kernel: [  366.611281] ------------[ cut here ]------------
> Jun 23 07:52:35 skull kernel: [  366.611282] Invalid SGL for payload:131072 nents:3
> Jun 23 07:52:35 skull kernel: [  366.611294] WARNING: CPU: 2 PID: 7675 at drivers/nvme/host/pci.c:748 nvme_queue_rq+0x9ed/0xa40

So far, no idea how the issue is triggered, maybe we can find something
by dumping the request, could you apply the following patch against your
5.1 kernel source code and post the log after the issue is triggered?

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a90cf5d63aac..c7999e362434 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -656,6 +656,29 @@ static void nvme_print_sgl(struct scatterlist *sgl, int nents)
 	}
 }
 
+static void nvme_dump_rq(const struct request *rq)
+{
+	int i = 0;
+	struct req_iterator rq_iter;
+	struct bio_vec bvec;
+	struct bio *bio;
+
+	trace_printk("%s: bvec for %d(f:%x/%x, seg: %d)\n", __func__,
+			rq->tag, rq->cmd_flags, rq->rq_flags,
+			rq->nr_phys_segments);
+
+	__rq_for_each_bio(bio, rq)
+		printk("bio %d: %hx/%hx %u %llu %u\n", i++,
+			bio->bi_flags, bio->bi_opf,
+			bio->bi_phys_segments,
+			(unsigned long long)bio->bi_iter.bi_sector,
+			bio->bi_iter.bi_size);
+	i = 0;
+	rq_for_each_bvec(bvec, rq, rq_iter)
+		printk("%d: pfn %lu offset %u len %u\n", i++,
+				(unsigned long)page_to_pfn(bvec.bv_page),
+				bvec.bv_offset, bvec.bv_len);
+}
 static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev,
 		struct request *req, struct nvme_rw_command *cmnd)
 {
@@ -746,6 +769,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev,
 	WARN(DO_ONCE(nvme_print_sgl, iod->sg, iod->nents),
 			"Invalid SGL for payload:%d nents:%d\n",
 			blk_rq_payload_bytes(req), iod->nents);
+	nvme_dump_rq(req);
 	return BLK_STS_IOERR;
 }
 

Thanks,
Ming

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [nvme] Invalid SGL for payload:91648 nents:3
       [not found]   ` <CAOSXXT5PgPJqouD0G7qJjar5tmy-9z+wWdFMqpD2K1=5vQZ+HA@mail.gmail.com>
@ 2019-06-25  9:45     ` Ming Lei
  2019-06-25  9:50       ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2019-06-25  9:45 UTC (permalink / raw)


On Tue, Jun 25, 2019@06:38:58PM +0900, Keith Busch wrote:
> The first two sg elements should have been merged to one and would have
> avoided the error. The bug is kernel, nothing to do with the device.

The default max segment size is 64k, so the first two can't be merged.

And the 1st sg should have been dispatched as single request.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [nvme] Invalid SGL for payload:91648 nents:3
  2019-06-25  9:45     ` Ming Lei
@ 2019-06-25  9:50       ` Christoph Hellwig
  2019-06-25 10:07         ` Ming Lei
  2019-06-25 12:11         ` Wolfgang Rohdewald
  0 siblings, 2 replies; 11+ messages in thread
From: Christoph Hellwig @ 2019-06-25  9:50 UTC (permalink / raw)


On Tue, Jun 25, 2019@05:45:35PM +0800, Ming Lei wrote:
> On Tue, Jun 25, 2019@06:38:58PM +0900, Keith Busch wrote:
> > The first two sg elements should have been merged to one and would have
> > avoided the error. The bug is kernel, nothing to do with the device.
> 
> The default max segment size is 64k, so the first two can't be merged.
> 
> And the 1st sg should have been dispatched as single request.

Well, that is exactly what I fixed with my patches that went into
5.2.  Looks like we need to backport those as well.

Wolfgang, can you try the latest 5.2-rc git tree?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [nvme] Invalid SGL for payload:91648 nents:3
  2019-06-25  9:50       ` Christoph Hellwig
@ 2019-06-25 10:07         ` Ming Lei
  2019-06-25 12:11         ` Wolfgang Rohdewald
  1 sibling, 0 replies; 11+ messages in thread
From: Ming Lei @ 2019-06-25 10:07 UTC (permalink / raw)


Hi Christoph,

On Tue, Jun 25, 2019@11:50:13AM +0200, Christoph Hellwig wrote:
> On Tue, Jun 25, 2019@05:45:35PM +0800, Ming Lei wrote:
> > On Tue, Jun 25, 2019@06:38:58PM +0900, Keith Busch wrote:
> > > The first two sg elements should have been merged to one and would have
> > > avoided the error. The bug is kernel, nothing to do with the device.
> > 
> > The default max segment size is 64k, so the first two can't be merged.
> > 
> > And the 1st sg should have been dispatched as single request.
> 
> Well, that is exactly what I fixed with my patches that went into
> 5.2.  Looks like we need to backport those as well.

Even though without your patch, this issue shouldn't happen given the
two sg shouldn't be submitted in single request. Something must be
wrong, I'd suggest to root cause it.
 

Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [nvme] Invalid SGL for payload:91648 nents:3
  2019-06-25  9:50       ` Christoph Hellwig
  2019-06-25 10:07         ` Ming Lei
@ 2019-06-25 12:11         ` Wolfgang Rohdewald
  2019-06-25 13:29           ` Ming Lei
  1 sibling, 1 reply; 11+ messages in thread
From: Wolfgang Rohdewald @ 2019-06-25 12:11 UTC (permalink / raw)


On Di, 2019-06-25@11:50 +0200, Christoph Hellwig wrote:
> On Tue, Jun 25, 2019@05:45:35PM +0800, Ming Lei wrote:
> > On Tue, Jun 25, 2019@06:38:58PM +0900, Keith Busch wrote:
> > > The first two sg elements should have been merged to one and would have
> > > avoided the error. The bug is kernel, nothing to do with the device.
> > 
> > The default max segment size is 64k, so the first two can't be merged.
> > 
> > And the 1st sg should have been dispatched as single request.
> 
> Well, that is exactly what I fixed with my patches that went into
> 5.2.  Looks like we need to backport those as well.
> 
> Wolfgang, can you try the latest 5.2-rc git tree?

5.2.0-rc6 works fine.

Do you still want me to apply nvme_dump_rq() to 5.1 for finding the root cause?

-- 
mit freundlichen Gr?ssen

Wolfgang Rohdewald

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [nvme] Invalid SGL for payload:91648 nents:3
  2019-06-25 12:11         ` Wolfgang Rohdewald
@ 2019-06-25 13:29           ` Ming Lei
  2019-06-25 15:07             ` Wolfgang Rohdewald
  0 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2019-06-25 13:29 UTC (permalink / raw)


On Tue, Jun 25, 2019@02:11:04PM +0200, Wolfgang Rohdewald wrote:
> On Di, 2019-06-25@11:50 +0200, Christoph Hellwig wrote:
> > On Tue, Jun 25, 2019@05:45:35PM +0800, Ming Lei wrote:
> > > On Tue, Jun 25, 2019@06:38:58PM +0900, Keith Busch wrote:
> > > > The first two sg elements should have been merged to one and would have
> > > > avoided the error. The bug is kernel, nothing to do with the device.
> > > 
> > > The default max segment size is 64k, so the first two can't be merged.
> > > 
> > > And the 1st sg should have been dispatched as single request.
> > 
> > Well, that is exactly what I fixed with my patches that went into
> > 5.2.  Looks like we need to backport those as well.
> > 
> > Wolfgang, can you try the latest 5.2-rc git tree?
> 
> 5.2.0-rc6 works fine.
> 
> Do you still want me to apply nvme_dump_rq() to 5.1 for finding the root cause?

Yeah, please dump the request, and maybe there is issue in merge code.
BTW, please replace the trace_printk with printk in the debug patch.

Also not sure if Christoph's patch can be backported safely, given there
is report wrt. too big max segment size recently.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [nvme] Invalid SGL for payload:91648 nents:3
  2019-06-25 13:29           ` Ming Lei
@ 2019-06-25 15:07             ` Wolfgang Rohdewald
  2019-06-26  1:26               ` Ming Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Wolfgang Rohdewald @ 2019-06-25 15:07 UTC (permalink / raw)


On Di, 2019-06-25@21:29 +0800, Ming Lei wrote:
> On Tue, Jun 25, 2019@02:11:04PM +0200, Wolfgang Rohdewald wrote:
> > On Di, 2019-06-25@11:50 +0200, Christoph Hellwig wrote:
> > > On Tue, Jun 25, 2019@05:45:35PM +0800, Ming Lei wrote:
> > > > On Tue, Jun 25, 2019@06:38:58PM +0900, Keith Busch wrote:
> > > > > The first two sg elements should have been merged to one and would have
> > > > > avoided the error. The bug is kernel, nothing to do with the device.
> > > > 
> > > > The default max segment size is 64k, so the first two can't be merged.
> > > > 
> > > > And the 1st sg should have been dispatched as single request.
> > > 
> > > Well, that is exactly what I fixed with my patches that went into
> > > 5.2.  Looks like we need to backport those as well.
> > > 
> > > Wolfgang, can you try the latest 5.2-rc git tree?
> > 
> > 5.2.0-rc6 works fine.
> > 
> > Do you still want me to apply nvme_dump_rq() to 5.1 for finding the root cause?
> 
> Yeah, please dump the request, and maybe there is issue in merge code.
> BTW, please replace the trace_printk with printk in the debug patch.
> 
> Also not sure if Christoph's patch can be backported safely, given there
> is report wrt. too big max segment size recently.
> 
> Thanks,
> Ming


Jun 25 17:05:05 skull kernel: [  150.419778] sg[0] phys_addr:0x000000049cbe9000 offset:0 length:32256 dma_address:0x000000049cbe9000 dma_length:32256
Jun 25 17:05:05 skull kernel: [  150.419780] sg[1] phys_addr:0x000000049cbf0e00 offset:3584 length:40960 dma_address:0x000000049cbf0e00 dma_length:40960
Jun 25 17:05:05 skull kernel: [  150.419783] ------------[ cut here ]------------
Jun 25 17:05:05 skull kernel: [  150.419784] Invalid SGL for payload:73216 nents:2
Jun 25 17:05:05 skull kernel: [  150.419820] WARNING: CPU: 2 PID: 3566 at drivers/nvme/host/pci.c:771 nvme_pci_setup_prps.isra.48+0x372/0x3d0 [nvme]
Jun 25 17:05:05 skull kernel: [  150.419821] Modules linked in: ccm rfcomm ipt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter xt_conntrack nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc rpcsec_gss_krb5 auth_rpcgss pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfsv4 nfs lockd grace
fscache dummy nfnetlink_queue nfnetlink_log nfnetlink cmac bnep btusb btrtl btbcm btintel bluetooth joydev input_leds ecdh_generic overlay binfmt_misc nls_iso8859_1 arc4 snd_hda_codec_hdmi intel_rapl
x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec crct10dif_pclmul snd_hda_core crc32_pclmul snd_hwdep snd_pcm ghash_clmulni_intel
snd_seq_midi snd_seq_midi_event iwlmvm snd_rawmidi mac80211 iwlwifi snd_seq aesni_intel aes_x86_64 crypto_simd cryptd glue_helper snd_seq_device snd_timer intel_cstate intel_rapl_perf snd cfg80211
soundcore mei_me intel_pch_thermal mei ir_rc6_decoder wmi
Jun 25 17:05:05 skull kernel: [  150.419845]  rc_rc6_mce nuvoton_cir rc_core acpi_pad squashfs sch_fq_codel sunrpc nct6775 hwmon_vid coretemp ip_tables x_tables autofs4 hid_logitech_hidpp
hid_logitech_dj hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvme sdhci_pci drm e1000e cqhci i2c_i801 nvme_core sdhci video
Jun 25 17:05:05 skull kernel: [  150.419857] CPU: 2 PID: 3566 Comm: AioMgr0-N Tainted: G           OE     5.1.15- #1
Jun 25 17:05:05 skull kernel: [  150.419857] Hardware name:  /NUC6i7KYB, BIOS KYSKLi70.86A.0062.2019.0507.1702 05/07/2019
Jun 25 17:05:05 skull kernel: [  150.419860] RIP: 0010:nvme_pci_setup_prps.isra.48+0x372/0x3d0 [nvme]
Jun 25 17:05:05 skull kernel: [  150.419861] Code: 48 c7 c7 c0 85 0b c0 e8 2c 08 fb da 41 8b 97 4c 01 00 00 41 f6 47 1e 04 75 57 41 8b 77 24 48 c7 c7 38 4a 0b c0 e8 78 fb bb da <0f> 0b e9 28 ff ff ff
b8 0a 00 00 00 e9 f3 fe ff ff 4c 8b 7d 98 4d
Jun 25 17:05:05 skull kernel: [  150.419863] RSP: 0018:ffffa39186847558 EFLAGS: 00010282
Jun 25 17:05:05 skull kernel: [  150.419864] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
Jun 25 17:05:05 skull kernel: [  150.419865] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff95d61ea954f0
Jun 25 17:05:05 skull kernel: [  150.419865] RBP: ffffa391868475e8 R08: 000000000000066d R09: 0000000000000004
Jun 25 17:05:05 skull kernel: [  150.419866] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
Jun 25 17:05:05 skull kernel: [  150.419867] R13: 0000000000000002 R14: 0000000000000007 R15: ffff95d60feed000
Jun 25 17:05:05 skull kernel: [  150.419868] FS:  00007ff4d6168700(0000) GS:ffff95d61ea80000(0000) knlGS:0000000000000000
Jun 25 17:05:05 skull kernel: [  150.419869] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 25 17:05:05 skull kernel: [  150.419870] CR2: ffffe20ea05ec2e8 CR3: 0000000500d24004 CR4: 00000000003626e0
Jun 25 17:05:05 skull kernel: [  150.419870] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 25 17:05:05 skull kernel: [  150.419871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 25 17:05:05 skull kernel: [  150.419872] Call Trace:
Jun 25 17:05:05 skull kernel: [  150.419878]  nvme_queue_rq+0x566/0x710 [nvme]
Jun 25 17:05:05 skull kernel: [  150.419883]  __blk_mq_try_issue_directly+0x114/0x1c0
Jun 25 17:05:05 skull kernel: [  150.419886]  ? ktime_get_raw_ts64+0x30/0xc0
Jun 25 17:05:05 skull kernel: [  150.419888]  blk_mq_request_issue_directly+0x48/0xe0
Jun 25 17:05:05 skull kernel: [  150.419890]  blk_mq_try_issue_list_directly+0x41/0xb0
Jun 25 17:05:05 skull kernel: [  150.419893]  blk_mq_sched_insert_requests+0xa5/0xf0
Jun 25 17:05:05 skull kernel: [  150.419895]  blk_mq_flush_plug_list+0x211/0x2c0
Jun 25 17:05:05 skull kernel: [  150.419898]  blk_flush_plug_list+0xc6/0xf0
Jun 25 17:05:05 skull kernel: [  150.419900]  blk_mq_make_request+0x208/0x580
Jun 25 17:05:05 skull kernel: [  150.419902]  generic_make_request+0x19e/0x400
Jun 25 17:05:05 skull kernel: [  150.419904]  submit_bio+0x45/0x140
Jun 25 17:05:05 skull kernel: [  150.419908]  do_blockdev_direct_IO+0x21e4/0x2e30
Jun 25 17:05:05 skull kernel: [  150.419912]  ? find_get_pages_range_tag+0x81/0x2d0
Jun 25 17:05:05 skull kernel: [  150.419915]  ? ext4_dio_get_block_unwritten_sync+0x50/0x50
Jun 25 17:05:05 skull kernel: [  150.419917]  ext4_direct_IO+0x359/0x740
Jun 25 17:05:05 skull kernel: [  150.419920]  ? touch_atime+0x33/0xe0
Jun 25 17:05:05 skull kernel: [  150.419923]  generic_file_read_iter+0x9c/0xc90
Jun 25 17:05:05 skull kernel: [  150.419925]  aio_read+0xec/0x160
Jun 25 17:05:05 skull kernel: [  150.419929]  ? tomoyo_init_request_info+0x84/0x90
Jun 25 17:05:05 skull kernel: [  150.419930]  ? tomoyo_path_number_perm+0x68/0x1e0
Jun 25 17:05:05 skull kernel: [  150.419933]  ? _cond_resched+0x15/0x30
Jun 25 17:05:05 skull kernel: [  150.419934]  ? __kmalloc+0x166/0x200
Jun 25 17:05:05 skull kernel: [  150.419949]  ? rtR0MemAllocEx+0x14c/0x240 [vboxdrv]
Jun 25 17:05:05 skull kernel: [  150.419958]  ? supdrvIOCtl+0xcc1/0x36e0 [vboxdrv]
Jun 25 17:05:05 skull kernel: [  150.419959]  ? kmem_cache_alloc+0x150/0x1c0
Jun 25 17:05:05 skull kernel: [  150.419961]  io_submit_one+0x171/0xb20
Jun 25 17:05:05 skull kernel: [  150.419967]  ? VBoxDrvLinuxIOCtl_6_0_6+0x188/0x240 [vboxdrv]
Jun 25 17:05:05 skull kernel: [  150.419969]  ? __x64_sys_io_submit+0xa9/0x190
Jun 25 17:05:05 skull kernel: [  150.419971]  ? lookup_ioctx+0x1d/0x90
Jun 25 17:05:05 skull kernel: [  150.419973]  __x64_sys_io_submit+0xa9/0x190
Jun 25 17:05:05 skull kernel: [  150.419975]  do_syscall_64+0x5f/0x190
Jun 25 17:05:05 skull kernel: [  150.419978]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 25 17:05:05 skull kernel: [  150.419980] RIP: 0033:0x7ff539ace2e9
Jun 25 17:05:05 skull kernel: [  150.419981] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73
01 c3 48 8b 0d 77 cb 0c 00 f7 d8 64 89 01 48
Jun 25 17:05:05 skull kernel: [  150.419982] RSP: 002b:00007ff4d6167b98 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
Jun 25 17:05:05 skull kernel: [  150.419984] RAX: ffffffffffffffda RBX: 00007ff4b801c9b0 RCX: 00007ff539ace2e9
Jun 25 17:05:05 skull kernel: [  150.419984] RDX: 00007ff4d6167c60 RSI: 0000000000000005 RDI: 00007ff50802c000
Jun 25 17:05:05 skull kernel: [  150.419985] RBP: 00007ff4d6167bc0 R08: 0000000000000001 R09: 00007ff4d6167c70
Jun 25 17:05:05 skull kernel: [  150.419986] R10: 00007ff4b801b960 R11: 0000000000000246 R12: 00007ff4d6167c60
Jun 25 17:05:05 skull kernel: [  150.419986] R13: 0000000000000005 R14: 00007ff4e0940100 R15: 0000000000000005
Jun 25 17:05:05 skull kernel: [  150.419988] ---[ end trace 581bcb8c21b397f6 ]---
Jun 25 17:05:05 skull kernel: [  150.419994] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:05 skull kernel: [  150.420017] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:05 skull kernel: [  150.420022] 0: pfn 4836329 offset 0 len 32256
Jun 25 17:05:05 skull kernel: [  150.420027] 1: pfn 4836336 offset 3584 len 40960
Jun 25 17:05:05 skull kernel: [  150.420034] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:05 skull kernel: [  150.422640] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:05 skull kernel: [  150.422642] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:05 skull kernel: [  150.422644] 0: pfn 4834601 offset 0 len 32256
Jun 25 17:05:05 skull kernel: [  150.422646] 1: pfn 4834608 offset 3584 len 40960
Jun 25 17:05:05 skull kernel: [  150.422649] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000
Jun 25 17:05:05 skull kernel: [  150.424948] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:05 skull kernel: [  150.424949] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:05 skull kernel: [  150.424951] 0: pfn 4834377 offset 0 len 32256
Jun 25 17:05:05 skull kernel: [  150.424952] 1: pfn 4834384 offset 3584 len 40960
Jun 25 17:05:05 skull kernel: [  150.424954] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:05 skull kernel: [  150.428223] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:05 skull kernel: [  150.428224] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:05 skull kernel: [  150.428225] 0: pfn 4835945 offset 0 len 32256
Jun 25 17:05:05 skull kernel: [  150.428226] 1: pfn 4835952 offset 3584 len 40960
Jun 25 17:05:05 skull kernel: [  150.428229] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000
Jun 25 17:05:05 skull kernel: [  150.452483] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:05 skull kernel: [  150.452485] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:05 skull kernel: [  150.452487] 0: pfn 4835399 offset 0 len 32256
Jun 25 17:05:05 skull kernel: [  150.452488] 1: pfn 4835406 offset 3584 len 40960
Jun 25 17:05:05 skull kernel: [  150.452490] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000
Jun 25 17:05:05 skull kernel: [  150.455112] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:05 skull kernel: [  150.455113] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:05 skull kernel: [  150.455114] 0: pfn 4834601 offset 0 len 32256
Jun 25 17:05:05 skull kernel: [  150.455115] 1: pfn 4834608 offset 3584 len 40960
Jun 25 17:05:05 skull kernel: [  150.455118] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:05 skull kernel: [  150.457630] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:05 skull kernel: [  150.457632] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:05 skull kernel: [  150.457633] 0: pfn 4835399 offset 0 len 32256
Jun 25 17:05:05 skull kernel: [  150.457634] 1: pfn 4835406 offset 3584 len 40960
Jun 25 17:05:05 skull kernel: [  150.457637] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:11 skull kernel: [  155.479814] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:11 skull kernel: [  155.479817] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:11 skull kernel: [  155.479818] 0: pfn 4835945 offset 0 len 32256
Jun 25 17:05:11 skull kernel: [  155.479820] 1: pfn 4835952 offset 3584 len 40960
Jun 25 17:05:11 skull kernel: [  155.479825] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:11 skull kernel: [  155.482676] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:11 skull kernel: [  155.482679] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:11 skull kernel: [  155.482680] 0: pfn 4835945 offset 0 len 32256
Jun 25 17:05:11 skull kernel: [  155.482682] 1: pfn 4835952 offset 3584 len 40960
Jun 25 17:05:11 skull kernel: [  155.482685] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000
Jun 25 17:05:11 skull kernel: [  155.485259] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:11 skull kernel: [  155.485261] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:11 skull kernel: [  155.485263] 0: pfn 4834601 offset 0 len 32256
Jun 25 17:05:11 skull kernel: [  155.485264] 1: pfn 4834608 offset 3584 len 40960
Jun 25 17:05:11 skull kernel: [  155.485268] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:11 skull kernel: [  155.487906] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:11 skull kernel: [  155.487908] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:11 skull kernel: [  155.487909] 0: pfn 4834377 offset 0 len 32256
Jun 25 17:05:11 skull kernel: [  155.487911] 1: pfn 4834384 offset 3584 len 40960
Jun 25 17:05:11 skull kernel: [  155.487913] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000
Jun 25 17:05:11 skull kernel: [  155.513063] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:11 skull kernel: [  155.513066] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:11 skull kernel: [  155.513068] 0: pfn 4834601 offset 0 len 32256
Jun 25 17:05:11 skull kernel: [  155.513069] 1: pfn 4834608 offset 3584 len 40960
Jun 25 17:05:11 skull kernel: [  155.513073] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:11 skull kernel: [  155.515837] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:11 skull kernel: [  155.515839] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:11 skull kernel: [  155.515840] 0: pfn 4834569 offset 0 len 32256
Jun 25 17:05:11 skull kernel: [  155.515841] 1: pfn 4834576 offset 3584 len 40960
Jun 25 17:05:11 skull kernel: [  155.515843] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000
Jun 25 17:05:11 skull kernel: [  155.518592] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:11 skull kernel: [  155.518594] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:11 skull kernel: [  155.518595] 0: pfn 4836329 offset 0 len 32256
Jun 25 17:05:11 skull kernel: [  155.518596] 1: pfn 4836336 offset 3584 len 40960
Jun 25 17:05:11 skull kernel: [  155.518599] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:11 skull kernel: [  155.523518] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:11 skull kernel: [  155.523520] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:11 skull kernel: [  155.523522] 0: pfn 4836329 offset 0 len 32256
Jun 25 17:05:11 skull kernel: [  155.523523] 1: pfn 4836336 offset 3584 len 40960
Jun 25 17:05:11 skull kernel: [  155.523527] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:16 skull kernel: [  160.557994] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:16 skull kernel: [  160.557997] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:16 skull kernel: [  160.557999] 0: pfn 4835399 offset 0 len 32256
Jun 25 17:05:16 skull kernel: [  160.558001] 1: pfn 4835406 offset 3584 len 40960
Jun 25 17:05:16 skull kernel: [  160.558004] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:16 skull kernel: [  160.560787] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:16 skull kernel: [  160.560789] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:16 skull kernel: [  160.560791] 0: pfn 4834377 offset 0 len 32256
Jun 25 17:05:16 skull kernel: [  160.560792] 1: pfn 4834384 offset 3584 len 40960
Jun 25 17:05:16 skull kernel: [  160.560794] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:16 skull kernel: [  160.563253] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:16 skull kernel: [  160.563254] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:16 skull kernel: [  160.563255] 0: pfn 4835945 offset 0 len 32256
Jun 25 17:05:16 skull kernel: [  160.563256] 1: pfn 4835952 offset 3584 len 40960
Jun 25 17:05:16 skull kernel: [  160.563258] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:16 skull kernel: [  160.565808] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:16 skull kernel: [  160.565810] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:16 skull kernel: [  160.565812] 0: pfn 4836329 offset 0 len 32256
Jun 25 17:05:16 skull kernel: [  160.565813] 1: pfn 4836336 offset 3584 len 40960
Jun 25 17:05:16 skull kernel: [  160.565816] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:21 skull kernel: [  165.588911] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:21 skull kernel: [  165.588913] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:21 skull kernel: [  165.588914] 0: pfn 4835399 offset 0 len 32256
Jun 25 17:05:21 skull kernel: [  165.588915] 1: pfn 4835406 offset 3584 len 40960
Jun 25 17:05:21 skull kernel: [  165.588917] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:21 skull kernel: [  165.591515] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:21 skull kernel: [  165.591516] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:21 skull kernel: [  165.591517] 0: pfn 4835463 offset 0 len 32256
Jun 25 17:05:21 skull kernel: [  165.591518] 1: pfn 4835470 offset 3584 len 40960
Jun 25 17:05:21 skull kernel: [  165.591521] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000
Jun 25 17:05:21 skull kernel: [  165.596861] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:21 skull kernel: [  165.596863] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:21 skull kernel: [  165.596864] 0: pfn 4835945 offset 0 len 32256
Jun 25 17:05:21 skull kernel: [  165.596865] 1: pfn 4835952 offset 3584 len 40960
Jun 25 17:05:21 skull kernel: [  165.596867] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:21 skull kernel: [  165.618673] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:21 skull kernel: [  165.618675] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:21 skull kernel: [  165.618677] 0: pfn 4835431 offset 0 len 32256
Jun 25 17:05:21 skull kernel: [  165.618678] 1: pfn 4835438 offset 3584 len 40960
Jun 25 17:05:21 skull kernel: [  165.618681] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:21 skull kernel: [  165.620969] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:21 skull kernel: [  165.620971] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:21 skull kernel: [  165.620972] 0: pfn 4834601 offset 0 len 32256
Jun 25 17:05:21 skull kernel: [  165.620973] 1: pfn 4834608 offset 3584 len 40960
Jun 25 17:05:21 skull kernel: [  165.620975] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000
Jun 25 17:05:21 skull kernel: [  165.623245] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:21 skull kernel: [  165.623246] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:21 skull kernel: [  165.623248] 0: pfn 4834377 offset 0 len 32256
Jun 25 17:05:21 skull kernel: [  165.623249] 1: pfn 4834384 offset 3584 len 40960
Jun 25 17:05:21 skull kernel: [  165.623251] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:21 skull kernel: [  165.625474] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:21 skull kernel: [  165.625475] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:21 skull kernel: [  165.625477] 0: pfn 4835431 offset 0 len 32256
Jun 25 17:05:21 skull kernel: [  165.625478] 1: pfn 4835438 offset 3584 len 40960
Jun 25 17:05:21 skull kernel: [  165.625480] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000
Jun 25 17:05:26 skull kernel: [  170.647608] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:26 skull kernel: [  170.647610] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:26 skull kernel: [  170.647611] 0: pfn 4835945 offset 0 len 32256
Jun 25 17:05:26 skull kernel: [  170.647613] 1: pfn 4835952 offset 3584 len 40960
Jun 25 17:05:26 skull kernel: [  170.647615] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:26 skull kernel: [  170.650245] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:26 skull kernel: [  170.650247] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:26 skull kernel: [  170.650248] 0: pfn 4834377 offset 0 len 32256
Jun 25 17:05:26 skull kernel: [  170.650249] 1: pfn 4834384 offset 3584 len 40960
Jun 25 17:05:26 skull kernel: [  170.650252] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000
Jun 25 17:05:26 skull kernel: [  170.652436] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:26 skull kernel: [  170.652437] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:26 skull kernel: [  170.652438] 0: pfn 4835463 offset 0 len 32256
Jun 25 17:05:26 skull kernel: [  170.652439] 1: pfn 4835470 offset 3584 len 40960
Jun 25 17:05:26 skull kernel: [  170.652441] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:26 skull kernel: [  170.655278] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:26 skull kernel: [  170.655280] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:26 skull kernel: [  170.655281] 0: pfn 4836329 offset 0 len 32256
Jun 25 17:05:26 skull kernel: [  170.655282] 1: pfn 4836336 offset 3584 len 40960
Jun 25 17:05:26 skull kernel: [  170.655284] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000
Jun 25 17:05:26 skull kernel: [  170.678838] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:26 skull kernel: [  170.678839] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:26 skull kernel: [  170.678840] 0: pfn 4834601 offset 0 len 32256
Jun 25 17:05:26 skull kernel: [  170.678841] 1: pfn 4834608 offset 3584 len 40960
Jun 25 17:05:26 skull kernel: [  170.678843] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:26 skull kernel: [  170.681377] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:26 skull kernel: [  170.681377] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:26 skull kernel: [  170.681378] 0: pfn 4834601 offset 0 len 32256
Jun 25 17:05:26 skull kernel: [  170.681379] 1: pfn 4834608 offset 3584 len 40960
Jun 25 17:05:26 skull kernel: [  170.681381] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000
Jun 25 17:05:26 skull kernel: [  170.683644] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:26 skull kernel: [  170.683645] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:26 skull kernel: [  170.683646] 0: pfn 4834601 offset 0 len 32256
Jun 25 17:05:26 skull kernel: [  170.683647] 1: pfn 4834608 offset 3584 len 40960
Jun 25 17:05:26 skull kernel: [  170.683648] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:26 skull kernel: [  170.686057] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:26 skull kernel: [  170.686058] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:26 skull kernel: [  170.686059] 0: pfn 4835463 offset 0 len 32256
Jun 25 17:05:26 skull kernel: [  170.686059] 1: pfn 4835470 offset 3584 len 40960
Jun 25 17:05:26 skull kernel: [  170.686061] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 4000
Jun 25 17:05:31 skull kernel: [  175.693443] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:31 skull kernel: [  175.693445] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:31 skull kernel: [  175.693446] 0: pfn 4834601 offset 0 len 32256
Jun 25 17:05:31 skull kernel: [  175.693448] 1: pfn 4834608 offset 3584 len 40960
Jun 25 17:05:31 skull kernel: [  175.693451] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0
Jun 25 17:05:31 skull kernel: [  175.698118] bio 0: 7602/0 1 2105917377 32256
Jun 25 17:05:31 skull kernel: [  175.698120] bio 1: 6602/0 1 2105917440 40960
Jun 25 17:05:31 skull kernel: [  175.698121] 0: pfn 4834569 offset 0 len 32256
Jun 25 17:05:31 skull kernel: [  175.698122] 1: pfn 4834576 offset 3584 len 40960
Jun 25 17:05:31 skull kernel: [  175.698124] print_req_error: I/O error, dev nvme1n1, sector 2105917377 flags 0


-- 
mit freundlichen Gr?ssen

Wolfgang Rohdewald

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [nvme] Invalid SGL for payload:91648 nents:3
  2019-06-25 15:07             ` Wolfgang Rohdewald
@ 2019-06-26  1:26               ` Ming Lei
  2019-06-26 20:18                 ` Wolfgang Rohdewald
  2019-07-04 12:09                 ` Wolfgang Rohdewald
  0 siblings, 2 replies; 11+ messages in thread
From: Ming Lei @ 2019-06-26  1:26 UTC (permalink / raw)


On Tue, Jun 25, 2019@05:07:30PM +0200, Wolfgang Rohdewald wrote:
> On Di, 2019-06-25@21:29 +0800, Ming Lei wrote:
> > On Tue, Jun 25, 2019@02:11:04PM +0200, Wolfgang Rohdewald wrote:
> > > On Di, 2019-06-25@11:50 +0200, Christoph Hellwig wrote:
> > > > On Tue, Jun 25, 2019@05:45:35PM +0800, Ming Lei wrote:
> > > > > On Tue, Jun 25, 2019@06:38:58PM +0900, Keith Busch wrote:
> > > > > > The first two sg elements should have been merged to one and would have
> > > > > > avoided the error. The bug is kernel, nothing to do with the device.
> > > > > 
> > > > > The default max segment size is 64k, so the first two can't be merged.
> > > > > 
> > > > > And the 1st sg should have been dispatched as single request.
> > > > 
> > > > Well, that is exactly what I fixed with my patches that went into
> > > > 5.2.  Looks like we need to backport those as well.
> > > > 
> > > > Wolfgang, can you try the latest 5.2-rc git tree?
> > > 
> > > 5.2.0-rc6 works fine.
> > > 
> > > Do you still want me to apply nvme_dump_rq() to 5.1 for finding the root cause?
> > 
> > Yeah, please dump the request, and maybe there is issue in merge code.
> > BTW, please replace the trace_printk with printk in the debug patch.
> > 
> > Also not sure if Christoph's patch can be backported safely, given there
> > is report wrt. too big max segment size recently.
> > 
> > Thanks,
> > Ming
> 
> 
> Jun 25 17:05:05 skull kernel: [  150.419778] sg[0] phys_addr:0x000000049cbe9000 offset:0 length:32256 dma_address:0x000000049cbe9000 dma_length:32256
> Jun 25 17:05:05 skull kernel: [  150.419780] sg[1] phys_addr:0x000000049cbf0e00 offset:3584 length:40960 dma_address:0x000000049cbf0e00 dma_length:40960
> Jun 25 17:05:05 skull kernel: [  150.419783] ------------[ cut here ]------------
> Jun 25 17:05:05 skull kernel: [  150.419784] Invalid SGL for payload:73216 nents:2
> Jun 25 17:05:05 skull kernel: [  150.419820] WARNING: CPU: 2 PID: 3566 at drivers/nvme/host/pci.c:771 nvme_pci_setup_prps.isra.48+0x372/0x3d0 [nvme]
> Jun 25 17:05:05 skull kernel: [  150.419821] Modules linked in: ccm rfcomm ipt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter xt_conntrack nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc rpcsec_gss_krb5 auth_rpcgss pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfsv4 nfs lockd grace
> fscache dummy nfnetlink_queue nfnetlink_log nfnetlink cmac bnep btusb btrtl btbcm btintel bluetooth joydev input_leds ecdh_generic overlay binfmt_misc nls_iso8859_1 arc4 snd_hda_codec_hdmi intel_rapl
> x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec crct10dif_pclmul snd_hda_core crc32_pclmul snd_hwdep snd_pcm ghash_clmulni_intel
> snd_seq_midi snd_seq_midi_event iwlmvm snd_rawmidi mac80211 iwlwifi snd_seq aesni_intel aes_x86_64 crypto_simd cryptd glue_helper snd_seq_device snd_timer intel_cstate intel_rapl_perf snd cfg80211
> soundcore mei_me intel_pch_thermal mei ir_rc6_decoder wmi
> Jun 25 17:05:05 skull kernel: [  150.419845]  rc_rc6_mce nuvoton_cir rc_core acpi_pad squashfs sch_fq_codel sunrpc nct6775 hwmon_vid coretemp ip_tables x_tables autofs4 hid_logitech_hidpp
> hid_logitech_dj hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvme sdhci_pci drm e1000e cqhci i2c_i801 nvme_core sdhci video
> Jun 25 17:05:05 skull kernel: [  150.419857] CPU: 2 PID: 3566 Comm: AioMgr0-N Tainted: G           OE     5.1.15- #1
> Jun 25 17:05:05 skull kernel: [  150.419857] Hardware name:  /NUC6i7KYB, BIOS KYSKLi70.86A.0062.2019.0507.1702 05/07/2019
> Jun 25 17:05:05 skull kernel: [  150.419860] RIP: 0010:nvme_pci_setup_prps.isra.48+0x372/0x3d0 [nvme]
> Jun 25 17:05:05 skull kernel: [  150.419861] Code: 48 c7 c7 c0 85 0b c0 e8 2c 08 fb da 41 8b 97 4c 01 00 00 41 f6 47 1e 04 75 57 41 8b 77 24 48 c7 c7 38 4a 0b c0 e8 78 fb bb da <0f> 0b e9 28 ff ff ff
> b8 0a 00 00 00 e9 f3 fe ff ff 4c 8b 7d 98 4d
> Jun 25 17:05:05 skull kernel: [  150.419863] RSP: 0018:ffffa39186847558 EFLAGS: 00010282
> Jun 25 17:05:05 skull kernel: [  150.419864] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
> Jun 25 17:05:05 skull kernel: [  150.419865] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff95d61ea954f0
> Jun 25 17:05:05 skull kernel: [  150.419865] RBP: ffffa391868475e8 R08: 000000000000066d R09: 0000000000000004
> Jun 25 17:05:05 skull kernel: [  150.419866] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
> Jun 25 17:05:05 skull kernel: [  150.419867] R13: 0000000000000002 R14: 0000000000000007 R15: ffff95d60feed000
> Jun 25 17:05:05 skull kernel: [  150.419868] FS:  00007ff4d6168700(0000) GS:ffff95d61ea80000(0000) knlGS:0000000000000000
> Jun 25 17:05:05 skull kernel: [  150.419869] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jun 25 17:05:05 skull kernel: [  150.419870] CR2: ffffe20ea05ec2e8 CR3: 0000000500d24004 CR4: 00000000003626e0
> Jun 25 17:05:05 skull kernel: [  150.419870] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Jun 25 17:05:05 skull kernel: [  150.419871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Jun 25 17:05:05 skull kernel: [  150.419872] Call Trace:
> Jun 25 17:05:05 skull kernel: [  150.419878]  nvme_queue_rq+0x566/0x710 [nvme]
> Jun 25 17:05:05 skull kernel: [  150.419883]  __blk_mq_try_issue_directly+0x114/0x1c0
> Jun 25 17:05:05 skull kernel: [  150.419886]  ? ktime_get_raw_ts64+0x30/0xc0
> Jun 25 17:05:05 skull kernel: [  150.419888]  blk_mq_request_issue_directly+0x48/0xe0
> Jun 25 17:05:05 skull kernel: [  150.419890]  blk_mq_try_issue_list_directly+0x41/0xb0
> Jun 25 17:05:05 skull kernel: [  150.419893]  blk_mq_sched_insert_requests+0xa5/0xf0
> Jun 25 17:05:05 skull kernel: [  150.419895]  blk_mq_flush_plug_list+0x211/0x2c0
> Jun 25 17:05:05 skull kernel: [  150.419898]  blk_flush_plug_list+0xc6/0xf0
> Jun 25 17:05:05 skull kernel: [  150.419900]  blk_mq_make_request+0x208/0x580
> Jun 25 17:05:05 skull kernel: [  150.419902]  generic_make_request+0x19e/0x400
> Jun 25 17:05:05 skull kernel: [  150.419904]  submit_bio+0x45/0x140
> Jun 25 17:05:05 skull kernel: [  150.419908]  do_blockdev_direct_IO+0x21e4/0x2e30
> Jun 25 17:05:05 skull kernel: [  150.419912]  ? find_get_pages_range_tag+0x81/0x2d0
> Jun 25 17:05:05 skull kernel: [  150.419915]  ? ext4_dio_get_block_unwritten_sync+0x50/0x50
> Jun 25 17:05:05 skull kernel: [  150.419917]  ext4_direct_IO+0x359/0x740
> Jun 25 17:05:05 skull kernel: [  150.419920]  ? touch_atime+0x33/0xe0
> Jun 25 17:05:05 skull kernel: [  150.419923]  generic_file_read_iter+0x9c/0xc90
> Jun 25 17:05:05 skull kernel: [  150.419925]  aio_read+0xec/0x160
> Jun 25 17:05:05 skull kernel: [  150.419929]  ? tomoyo_init_request_info+0x84/0x90
> Jun 25 17:05:05 skull kernel: [  150.419930]  ? tomoyo_path_number_perm+0x68/0x1e0
> Jun 25 17:05:05 skull kernel: [  150.419933]  ? _cond_resched+0x15/0x30
> Jun 25 17:05:05 skull kernel: [  150.419934]  ? __kmalloc+0x166/0x200
> Jun 25 17:05:05 skull kernel: [  150.419949]  ? rtR0MemAllocEx+0x14c/0x240 [vboxdrv]
> Jun 25 17:05:05 skull kernel: [  150.419958]  ? supdrvIOCtl+0xcc1/0x36e0 [vboxdrv]
> Jun 25 17:05:05 skull kernel: [  150.419959]  ? kmem_cache_alloc+0x150/0x1c0
> Jun 25 17:05:05 skull kernel: [  150.419961]  io_submit_one+0x171/0xb20
> Jun 25 17:05:05 skull kernel: [  150.419967]  ? VBoxDrvLinuxIOCtl_6_0_6+0x188/0x240 [vboxdrv]
> Jun 25 17:05:05 skull kernel: [  150.419969]  ? __x64_sys_io_submit+0xa9/0x190
> Jun 25 17:05:05 skull kernel: [  150.419971]  ? lookup_ioctx+0x1d/0x90
> Jun 25 17:05:05 skull kernel: [  150.419973]  __x64_sys_io_submit+0xa9/0x190
> Jun 25 17:05:05 skull kernel: [  150.419975]  do_syscall_64+0x5f/0x190
> Jun 25 17:05:05 skull kernel: [  150.419978]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Jun 25 17:05:05 skull kernel: [  150.419980] RIP: 0033:0x7ff539ace2e9
> Jun 25 17:05:05 skull kernel: [  150.419981] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73
> 01 c3 48 8b 0d 77 cb 0c 00 f7 d8 64 89 01 48
> Jun 25 17:05:05 skull kernel: [  150.419982] RSP: 002b:00007ff4d6167b98 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
> Jun 25 17:05:05 skull kernel: [  150.419984] RAX: ffffffffffffffda RBX: 00007ff4b801c9b0 RCX: 00007ff539ace2e9
> Jun 25 17:05:05 skull kernel: [  150.419984] RDX: 00007ff4d6167c60 RSI: 0000000000000005 RDI: 00007ff50802c000
> Jun 25 17:05:05 skull kernel: [  150.419985] RBP: 00007ff4d6167bc0 R08: 0000000000000001 R09: 00007ff4d6167c70
> Jun 25 17:05:05 skull kernel: [  150.419986] R10: 00007ff4b801b960 R11: 0000000000000246 R12: 00007ff4d6167c60
> Jun 25 17:05:05 skull kernel: [  150.419986] R13: 0000000000000005 R14: 00007ff4e0940100 R15: 0000000000000005
> Jun 25 17:05:05 skull kernel: [  150.419988] ---[ end trace 581bcb8c21b397f6 ]---
> Jun 25 17:05:05 skull kernel: [  150.419994] bio 0: 7602/0 1 2105917377 32256
> Jun 25 17:05:05 skull kernel: [  150.420017] bio 1: 6602/0 1 2105917440 40960
> Jun 25 17:05:05 skull kernel: [  150.420022] 0: pfn 4836329 offset 0 len 32256
> Jun 25 17:05:05 skull kernel: [  150.420027] 1: pfn 4836336 offset 3584 len 40960

I can understand the issue now, and the issue isn't related with bio
split, and it is in the normal bio merge code: biovec_phys_mergeable()
which merges two bios if their bvecs are physically continuous, and not consider
max segment size limit, which is still applied in blk_bvec_map_sg(),
then sg with unaligned virt boundary is made.

Christoph's patch 09324d32d2a08 ("block: force an unlimited segment size on queues
with a virt boundary") can fix the issue, but another regression might be
caused because of too big segment size from view of IOMMU.

And the following patch should fix the 5.1 issue too:

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 1c9d4f0f96ea..ff69b7dd2179 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -60,7 +60,8 @@ static inline bool bio_will_gap(struct request_queue *q,
 	 */
 	bio_get_last_bvec(prev, &pb);
 	bio_get_first_bvec(next, &nb);
-	if (bios_segs_mergeable(q, prev, &pb, &nb))
+	if (bios_segs_mergeable(q, prev, &pb, &nb) && prev->bi_seg_back_size +
+			next->bi_seg_front_size < queue_max_segment_size(q))
 		return false;
 	return __bvec_gap_to_prev(q, &pb, nb.bv_offset);
 }

Thanks,
Ming

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [nvme] Invalid SGL for payload:91648 nents:3
  2019-06-26  1:26               ` Ming Lei
@ 2019-06-26 20:18                 ` Wolfgang Rohdewald
  2019-07-04 12:09                 ` Wolfgang Rohdewald
  1 sibling, 0 replies; 11+ messages in thread
From: Wolfgang Rohdewald @ 2019-06-26 20:18 UTC (permalink / raw)


On Mi, 2019-06-26@09:26 +0800, Ming Lei wrote:
> And the following patch should fix the 5.1 issue too:
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 1c9d4f0f96ea..ff69b7dd2179 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -60,7 +60,8 @@ static inline bool bio_will_gap(struct request_queue *q,
>          */
>         bio_get_last_bvec(prev, &pb);
>         bio_get_first_bvec(next, &nb);
> -       if (bios_segs_mergeable(q, prev, &pb, &nb))
> +       if (bios_segs_mergeable(q, prev, &pb, &nb) && prev->bi_seg_back_size +
> +                       next->bi_seg_front_size < queue_max_segment_size(q))
>                 return false;
>         return __bvec_gap_to_prev(q, &pb, nb.bv_offset);
>  }

So far, this works for me with 5.1.15. Before, the bug would trigger within 
seconds after starting windows 10 update in virtualbox 6.0.6, now it does not.

-- 
mit freundlichen Gr?ssen

Wolfgang Rohdewald

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [nvme] Invalid SGL for payload:91648 nents:3
  2019-06-26  1:26               ` Ming Lei
  2019-06-26 20:18                 ` Wolfgang Rohdewald
@ 2019-07-04 12:09                 ` Wolfgang Rohdewald
  1 sibling, 0 replies; 11+ messages in thread
From: Wolfgang Rohdewald @ 2019-07-04 12:09 UTC (permalink / raw)


On Mi, 2019-06-26@09:26 +0800, Ming Lei wrote:
> I can understand the issue now, and the issue isn't related with bio
> split, and it is in the normal bio merge code: biovec_phys_mergeable()
> which merges two bios if their bvecs are physically continuous, and not consider
> max segment size limit, which is still applied in blk_bvec_map_sg(),
> then sg with unaligned virt boundary is made.
> 
> Christoph's patch 09324d32d2a08 ("block: force an unlimited segment size on queues
> with a virt boundary") can fix the issue, but another regression might be
> caused because of too big segment size from view of IOMMU.
> 
> And the following patch should fix the 5.1 issue too:
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 1c9d4f0f96ea..ff69b7dd2179 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -60,7 +60,8 @@ static inline bool bio_will_gap(struct request_queue *q,
>          */
>         bio_get_last_bvec(prev, &pb);
>         bio_get_first_bvec(next, &nb);
> -       if (bios_segs_mergeable(q, prev, &pb, &nb))
> +       if (bios_segs_mergeable(q, prev, &pb, &nb) && prev->bi_seg_back_size +
> +                       next->bi_seg_front_size < queue_max_segment_size(q))
>                 return false;
>         return __bvec_gap_to_prev(q, &pb, nb.bv_offset);
>  }

I wonder why this is not in 5.1.16

-- 
mit freundlichen Gr?ssen

Wolfgang Rohdewald

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-07-04 12:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-25  7:51 [nvme] Invalid SGL for payload:91648 nents:3 Wolfgang Rohdewald
2019-06-25  9:17 ` Ming Lei
     [not found]   ` <CAOSXXT5PgPJqouD0G7qJjar5tmy-9z+wWdFMqpD2K1=5vQZ+HA@mail.gmail.com>
2019-06-25  9:45     ` Ming Lei
2019-06-25  9:50       ` Christoph Hellwig
2019-06-25 10:07         ` Ming Lei
2019-06-25 12:11         ` Wolfgang Rohdewald
2019-06-25 13:29           ` Ming Lei
2019-06-25 15:07             ` Wolfgang Rohdewald
2019-06-26  1:26               ` Ming Lei
2019-06-26 20:18                 ` Wolfgang Rohdewald
2019-07-04 12:09                 ` Wolfgang Rohdewald

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.