All of lore.kernel.org
 help / color / mirror / Atom feed
* NVMe over Fabrics I/O Error with RDMA/RXE
@ 2018-07-09  4:06 Bairen YI
  2018-07-09  8:17 ` Leon Romanovsky
  2018-07-09 10:42 ` Sagi Grimberg
  0 siblings, 2 replies; 6+ messages in thread
From: Bairen YI @ 2018-07-09  4:06 UTC (permalink / raw)


Hi all,

I am testing NVMe over Fabrics on a Linux 4.15 VM (latest Ubuntu 18.04 LTS, unmodified kernel), with an NVMe card and SoftRoCE (RXE) as transport. 

It works just fine for the client connecting NVMeof target. However, it reports I/O error and timeout when read/write to the target over RXE.

NVMeof target demsg:

[  207.924912] rdma_rxe: loaded
[  207.974258] rdma_rxe: set rxe0 active
[  207.974259] rdma_rxe: added rxe0 to ens5
[  208.136753] RPC: Registered named UNIX socket transport module.
[  208.136754] RPC: Registered udp transport module.
[  208.136755] RPC: Registered tcp transport module.
[  208.136755] RPC: Registered tcp NFSv4.1 backchannel transport module.
[  208.221549] RPC: Registered rdma transport module.
[  208.221550] RPC: Registered rdma backchannel transport module.
[  814.714362] nvmet: adding nsid 10 to subsystem nvmeof
[  883.716604] nvmet_rdma: enabling port 1 (10.140.0.2:4420)

NVMeof client demsg:

[  210.410272] rdma_rxe: loaded
[  210.438950] rdma_rxe: set rxe0 active
[  210.438951] rdma_rxe: added rxe0 to ens4
[  210.575369] RPC: Registered named UNIX socket transport module.
[  210.575370] RPC: Registered udp transport module.
[  210.575371] RPC: Registered tcp transport module.
[  210.575371] RPC: Registered tcp NFSv4.1 backchannel transport module.
[  210.623886] RPC: Registered rdma transport module.
[  210.623887] RPC: Registered rdma backchannel transport module.
[  914.043768] nvme nvme0: creating 4 I/O queues.
[  914.049079] nvme nvme0: new ctrl: NQN "nvmeof", addr 10.140.0.2:4420

After an `fio` benchmarking command, the client reports error:

fio: io_u error on file /dev/nvme0n1: Input/output error: read offset=19457638400, buflen=65536
fio: io_u error on file /dev/nvme0n1: Input/output error: read offset=59311718400, buflen=65536
fio: io_u error on file /dev/nvme0n1: Input/output error: read offset=283416395776, buflen=65536
fio: io_u error on file /dev/nvme0n1: Input/output error: read offset=310615080960, buflen=65536
fio: io_u error on file /dev/nvme0n1: Input/output error: read offset=170021748736, buflen=65536
fio: io_u error on file /dev/nvme0n1: Input/output error: read offset=215924080640, buflen=65536
fio: io_u error on file /dev/nvme0n1: Input/output error: read offset=271643115520, buflen=65536
fio: io_u error on file /dev/nvme0n1: Input/output error: read offset=302967488512, buflen=65536
fio: io_u error on file /dev/nvme0n1: Input/output error: read offset=346368442368, buflen=65536
fio: io_u error on file /dev/nvme0n1: Input/output error: read offset=373731688448, buflen=65536
fio: io_u error on file /dev/nvme0n1: Input/output error: read offset=230237929472, buflen=65536
fio: io_u error on file /dev/nvme0n1: Input/output error: read offset=218070908928, buflen=65536
fio: pid=1963, err=5/file:io_u.c:1756, func=io_u error, error=Input/output error
fio: pid=1956, err=5/file:io_u.c:1756, func=io_u error, error=Input/output error
fio: io_u error on file /dev/nvme0n1: Input/output error: read offset=369396809728, buflen=65

NVMeof target dmesg:

[  814.714362] nvmet: adding nsid 10 to subsystem nvmeof
[  883.716604] nvmet_rdma: enabling port 1 (10.140.0.2:4420)
[  916.450484] nvmet: creating controller 1 for subsystem nvmeof for NQN nqn.2014-08.org.nvmexpress:uuid:2f1af066-4ffe-4c77-aa5e-5a062ae3f561.
[  916.456273] nvmet: adding queue 1 to ctrl 1.
[  916.456413] nvmet: adding queue 2 to ctrl 1.
[  916.456566] nvmet: adding queue 3 to ctrl 1.
[  916.456689] nvmet: adding queue 4 to ctrl 1.
[ 1004.436366] nvmet_rdma: freeing queue 1
[ 1004.436619] nvmet_rdma: freeing queue 2
[ 1004.436825] nvmet_rdma: freeing queue 3
[ 1004.437534] nvmet_rdma: freeing queue 4
[ 1004.452101] nvmet_rdma: freeing queue 0
[ 1014.685771] nvmet: creating controller 1 for subsystem nvmeof for NQN nqn.2014-08.org.nvmexpress:uuid:2f1af066-4ffe-4c77-aa5e-5a062ae3f561.
[ 1014.690580] nvmet: adding queue 1 to ctrl 1.
[ 1014.690862] nvmet: adding queue 2 to ctrl 1.
[ 1014.691208] nvmet: adding queue 3 to ctrl 1.
[ 1014.691348] nvmet: adding queue 4 to ctrl 1.

NVMeof client dmesg:

[  914.043768] nvme nvme0: creating 4 I/O queues.
[  914.049079] nvme nvme0: new ctrl: NQN "nvmeof", addr 10.140.0.2:4420
[ 1001.936366] nvme nvme0: I/O 1 QID 3 timeout, reset controller
[ 1001.936386] print_req_error: I/O error, dev nvme0c1n1, sector 38003200
[ 1001.943222] nvme nvme0: I/O 2 QID 3 timeout, reset controller
[ 1001.943225] print_req_error: I/O error, dev nvme0c1n1, sector 115843200
[ 1001.950312] nvme nvme0: I/O 3 QID 3 timeout, reset controller
[ 1001.950400] print_req_error: I/O error, dev nvme0c1n1, sector 591733376
[ 1001.958740] nvme nvme0: I/O 4 QID 3 timeout, reset controller
[ 1001.958750] print_req_error: I/O error, dev nvme0c1n1, sector 676500864
[ 1001.967384] nvme nvme0: I/O 5 QID 3 timeout, reset controller
[ 1001.967389] print_req_error: I/O error, dev nvme0c1n1, sector 553547648
[ 1001.975602] nvme nvme0: I/O 6 QID 3 timeout, reset controller
[ 1001.975606] print_req_error: I/O error, dev nvme0c1n1, sector 606670080
[ 1001.983731] nvme nvme0: I/O 7 QID 3 timeout, reset controller
[ 1001.983735] print_req_error: I/O error, dev nvme0c1n1, sector 721478144
[ 1001.991889] nvme nvme0: I/O 65 QID 3 timeout, reset controller
[ 1001.991892] print_req_error: I/O error, dev nvme0c1n1, sector 332073728
[ 1002.000104] nvme nvme0: I/O 66 QID 3 timeout, reset controller
[ 1002.000107] print_req_error: I/O error, dev nvme0c1n1, sector 421726720
[ 1002.008453] nvme nvme0: I/O 67 QID 3 timeout, reset controller
[ 1002.008457] print_req_error: I/O error, dev nvme0c1n1, sector 530552960
[ 1002.016657] nvme nvme0: I/O 68 QID 3 timeout, reset controller
[ 1002.016669] nvme nvme0: I/O 69 QID 3 timeout, reset controller
[ 1002.016672] nvme nvme0: I/O 70 QID 3 timeout, reset controller
[ 1002.016682] nvme nvme0: I/O 71 QID 3 timeout, reset controller
[ 1002.016691] nvme nvme0: I/O 72 QID 3 timeout, reset controller
[ 1002.016699] nvme nvme0: I/O 73 QID 3 timeout, reset controller
[ 1002.016702] nvme nvme0: I/O 74 QID 3 timeout, reset controller
[ 1002.016714] nvme nvme0: I/O 75 QID 3 timeout, reset controller
[ 1002.016721] nvme nvme0: I/O 76 QID 3 timeout, reset controller
[ 1002.016726] nvme nvme0: I/O 78 QID 3 timeout, reset controller
[ 1002.016729] nvme nvme0: I/O 79 QID 3 timeout, reset controller
[ 1002.016732] nvme nvme0: I/O 80 QID 3 timeout, reset controller
[ 1002.016737] nvme nvme0: I/O 82 QID 3 timeout, reset controller
[ 1002.016742] nvme nvme0: I/O 83 QID 3 timeout, reset controller
[ 1002.016745] nvme nvme0: I/O 86 QID 3 timeout, reset controller
[ 1002.016752] nvme nvme0: I/O 88 QID 3 timeout, reset controller
[ 1002.016757] nvme nvme0: I/O 90 QID 3 timeout, reset controller
[ 1002.016761] nvme nvme0: I/O 91 QID 3 timeout, reset controller
[ 1002.016764] nvme nvme0: I/O 92 QID 3 timeout, reset controller
[ 1002.016772] nvme nvme0: I/O 93 QID 3 timeout, reset controller
[ 1002.016776] nvme nvme0: I/O 94 QID 3 timeout, reset controller
[ 1002.016779] nvme nvme0: I/O 95 QID 3 timeout, reset controller
[ 1002.016783] nvme nvme0: I/O 96 QID 3 timeout, reset controller
[ 1002.016787] nvme nvme0: I/O 97 QID 3 timeout, reset controller
[ 1002.016791] nvme nvme0: I/O 98 QID 3 timeout, reset controller
[ 1002.016797] nvme nvme0: I/O 99 QID 3 timeout, reset controller
[ 1002.016801] nvme nvme0: I/O 100 QID 3 timeout, reset controller
[ 1002.016804] nvme nvme0: I/O 101 QID 3 timeout, reset controller
[ 1002.016810] nvme nvme0: I/O 102 QID 3 timeout, reset controller
[ 1002.016816] nvme nvme0: I/O 103 QID 3 timeout, reset controller
[ 1002.016822] nvme nvme0: I/O 104 QID 3 timeout, reset controller
[ 1002.016826] nvme nvme0: I/O 105 QID 3 timeout, reset controller
[ 1002.016829] nvme nvme0: I/O 106 QID 3 timeout, reset controller
[ 1002.016836] nvme nvme0: I/O 107 QID 3 timeout, reset controller
[ 1002.016843] nvme nvme0: I/O 108 QID 3 timeout, reset controller
[ 1002.016849] nvme nvme0: I/O 109 QID 3 timeout, reset controller
[ 1002.016855] nvme nvme0: I/O 110 QID 3 timeout, reset controller
[ 1002.016858] nvme nvme0: I/O 111 QID 3 timeout, reset controller
[ 1002.016865] nvme nvme0: I/O 112 QID 3 timeout, reset controller
[ 1002.016868] nvme nvme0: I/O 113 QID 3 timeout, reset controller
[ 1002.016874] nvme nvme0: I/O 114 QID 3 timeout, reset controller
[ 1002.016878] nvme nvme0: I/O 115 QID 3 timeout, reset controller
[ 1002.016882] nvme nvme0: I/O 116 QID 3 timeout, reset controller
[ 1002.016887] nvme nvme0: I/O 117 QID 3 timeout, reset controller
[ 1002.016891] nvme nvme0: I/O 118 QID 3 timeout, reset controller
[ 1002.016898] nvme nvme0: I/O 119 QID 3 timeout, reset controller
[ 1002.016902] nvme nvme0: I/O 120 QID 3 timeout, reset controller
[ 1002.016907] nvme nvme0: I/O 121 QID 3 timeout, reset controller
[ 1002.016910] nvme nvme0: I/O 122 QID 3 timeout, reset controller
[ 1002.016914] nvme nvme0: I/O 123 QID 3 timeout, reset controller
[ 1002.016919] nvme nvme0: I/O 124 QID 3 timeout, reset controller
[ 1002.016924] nvme nvme0: I/O 125 QID 3 timeout, reset controller
[ 1002.016928] nvme nvme0: I/O 126 QID 3 timeout, reset controller
[ 1002.024449] block nvme0n1: no path available - requeuing I/O
[ 1002.040643] nvme nvme0: Reconnecting in 10 seconds...
[ 1012.274698] nvme nvme0: creating 4 I/O queues.
[ 1012.279370] nvme nvme0: Successfully reconnected (1 attempts)
[ 1012.331792] nvme0c1n1: detected capacity change from 0 to 402653184000

Does NVMeof support over RXE or other software RDMA (like SoftiWarp) now?

Best,
Bairen Yi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* NVMe over Fabrics I/O Error with RDMA/RXE
  2018-07-09  4:06 NVMe over Fabrics I/O Error with RDMA/RXE Bairen YI
@ 2018-07-09  8:17 ` Leon Romanovsky
  2018-07-09 10:42 ` Sagi Grimberg
  1 sibling, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2018-07-09  8:17 UTC (permalink / raw)


On Mon, Jul 09, 2018@04:06:05AM +0000, Bairen YI wrote:
> Hi all,
>

<...>

>
> Does NVMeof support over RXE or other software RDMA (like SoftiWarp) now?

NVMeof is card-agnostic, so it is supposed to work on RXE.

Thansk

>
> Best,
> Bairen Yi--
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20180709/1d373a25/attachment.sig>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* NVMe over Fabrics I/O Error with RDMA/RXE
  2018-07-09  4:06 NVMe over Fabrics I/O Error with RDMA/RXE Bairen YI
  2018-07-09  8:17 ` Leon Romanovsky
@ 2018-07-09 10:42 ` Sagi Grimberg
  2018-07-11  9:44   ` Bairen YI
  1 sibling, 1 reply; 6+ messages in thread
From: Sagi Grimberg @ 2018-07-09 10:42 UTC (permalink / raw)



> Hi all,

Hi Bairen,

> I am testing NVMe over Fabrics on a Linux 4.15 VM (latest Ubuntu 18.04 LTS, unmodified kernel), with an NVMe card and SoftRoCE (RXE) as transport.
> 
> It works just fine for the client connecting NVMeof target. However, it reports I/O error and timeout when read/write to the target over RXE.

Thanks for reporting,

Would it be possible to test with upstream kernel? I'm testing with
4.17.0 and I/O seems to work fine...

I suspect that probably some stable fixes need to be taken to 4.15.y?

What is your exact kernel version?

The difference from linux-stable-4.15.y is:
--
bb42f87e2924 IB/rxe: avoid unnecessary export
da2f3b286a39 IB/rxe: avoid calling WARN_ON_ONCE twice
e02637e97d68 IB: Fix RDMA_RXE and INFINIBAND_RDMAVT dependencies for 
DMA_VIRT_OPS
f5e27a203f4c Merge branch 'k.o/for-rc' into k.o/wip/dl-for-next
1661d3b0e218 nvmet,rxe: defer ip datagram sending to tasklet
ecb238f6a7f3 IB/cxgb4: use skb_put_zero()/__skb_put_zero
9fd4350ba895 IB/rxe: avoid double kfree_skb
2da36d44a9d5 IB/rxe: add RXE_START_MASK for rxe_opcode 
IB_OPCODE_RC_SEND_ONLY_INV
e12ee8ce5143 IB/rxe: remove unused function variable
0dff463a6a86 IB/rxe: change rxe_set_mtu function type to void
10c47d560603 IB/rxe: Change rxe_rcv to return void
fe896ceb5772 IB/rxe: replace refcount_inc with skb_get
2e47350789eb IB/rxe: optimize the function duplicate_request
8f1a72c815cf IB/rxe: make rxe_release_udp_tunnel static
76be04500be2 IB/rxe: avoid export symbols
0f02ba7ed16a IB/rxe: make the variable static
efc365e7290d IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
39e00b6cf658 IB/rxe: Removed GID add/del dummy routines
414448d249d8 RDMA: Use ib_gid_attr during GID modification
3e44e0ee0893 IB/providers: Avoid null netdev check for RoCE
0e1f9b924471 RDMA/providers: Simplify query_gid callback of RoCE providers
99dae690255e IB/rxe: optimize mcast recv process
f2e9bfac13c9 RDMA/rxe: Fix uABI structure layouts for 32/64 compat
0ede73bc012c IB/uverbs: Extend uverbs_ioctl header with driver_id
0c43ab371bcb RDMA/rxe: Use structs to describe the uABI instead of 
opencoding
b92ec0fe3224 RDMA/rxe: Get rid of confusing udata parameter to 
rxe_cq_chk_attr
43c9fc509fa5 rdma_rxe: make rxe work over 802.1q VLAN devices
befd8d98f230 IB/rxe: change the function rxe_init_device_param type
31f1bd14cbfe IB/rxe: remove unnecessary rxe in rxe_send
86af61764151 IB/rxe: remove unnecessary skb_clone
a6544a624c3f RDMA/rxe: Fix an out-of-bounds read
c33bab622d47 IB/rxe: Remove unused variable (char *rxe_qp_state_name[])
7061f28d8a2f rxe: Do not use 'struct sockaddr' in a uapi header
316663c6bfeb IB/rxe: remove redudant parameter in rxe_av_fill_ip_info
45a290f8f6c6 IB/rxe: change the function rxe_av_fill_ip_info to void
a402dc445dd6 IB/rxe: change the function to void from int
9c96f3d4ddba IB/rxe: remove unnecessary parameter in rxe_av_to_attr
ca3d9feee6a9 IB/rxe: change the function to void from int
1a241db19ca9 IB/rxe: remove redudant parameter in function
0812ed132178 IB/rxe: Change RDMA_RXE kconfig to use select
--

Possible suspect might be:
--
commit 2da36d44a9d54a2c6e1f8da1f7ccc26b0bc6cfec
Author: Jianchao Wang <jianchao.w.wang at oracle.com>
Date:   Thu Apr 26 11:52:39 2018 +0800

     IB/rxe: add RXE_START_MASK for rxe_opcode IB_OPCODE_RC_SEND_ONLY_INV

     w/o RXE_START_MASK, the last_psn of IB_OPCODE_RC_SEND_ONLY_INV
     will not be updated in update_wqe_psn, and the corresponding
     wqe will not be acked in rxe_completer due to its last_psn is
     zero. Finally, the other wqe will also not be able to be acked,
     because the wqe of IB_OPCODE_RC_SEND_ONLY_INV with last_psn 0
     is still there. This causes large amount of io timeout when
     nvmeof is over rxe.

     Add RXE_START_MASK for IB_OPCODE_RC_SEND_ONLY_INV to fix this.

     Signed-off-by: Jianchao Wang <jianchao.w.wang at oracle.com>
     Reviewed-by: Zhu Yanjun <yanjun.zhu at oracle.com>
     Signed-off-by: Doug Ledford <dledford at redhat.com>
--

Another useful stable candidate would be:
--
commit 1661d3b0e2183ce90f6611641c350a5aa02aaa80
Author: Alexandru Moise <00moses.alexander00 at gmail.com>
Date:   Tue May 8 11:02:02 2018 +0200

     nvmet,rxe: defer ip datagram sending to tasklet

     This addresses 3 separate problems:

     1. When using NVME over Fabrics we may end up sending IP
     packets in interrupt context, we should defer this work
     to a tasklet.

     [   50.939957] WARNING: CPU: 3 PID: 0 at kernel/softirq.c:161 
__local_bh_enable_ip+0x1f/0xa0
     [   50.942602] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: 
G        W         4.17.0-rc3-ARCH+ #104
     [   50.945466] Hardware name: QEMU Standard PC (i440FX + PIIX, 
1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
     [   50.948163] RIP: 0010:__local_bh_enable_ip+0x1f/0xa0
     [   50.949631] RSP: 0018:ffff88009c183900 EFLAGS: 00010006
     [   50.951029] RAX: 0000000080010403 RBX: 0000000000000200 RCX: 
0000000000000001
     [   50.952636] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 
ffffffff817e04ec
     [   50.954278] RBP: ffff88009c183910 R08: 0000000000000001 R09: 
0000000000000614
     [   50.956000] R10: ffffea00021d5500 R11: 0000000000000001 R12: 
ffffffff817e04ec
     [   50.957779] R13: 0000000000000000 R14: ffff88009566f400 R15: 
ffff8800956c7000
     [   50.959402] FS:  0000000000000000(0000) 
GS:ffff88009c180000(0000) knlGS:0000000000000000
     [   50.961552] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     [   50.963798] CR2: 000055c4ec0ccac0 CR3: 0000000002209001 CR4: 
00000000000606e0
     [   50.966121] Call Trace:
     [   50.966845]  <IRQ>
     [   50.967497]  __dev_queue_xmit+0x62d/0x690
     [   50.968722]  dev_queue_xmit+0x10/0x20
     [   50.969894]  neigh_resolve_output+0x173/0x190
     [   50.971244]  ip_finish_output2+0x2b8/0x370
     [   50.972527]  ip_finish_output+0x1d2/0x220
     [   50.973785]  ? ip_finish_output+0x1d2/0x220
     [   50.975010]  ip_output+0xd4/0x100
     [   50.975903]  ip_local_out+0x3b/0x50
     [   50.976823]  rxe_send+0x74/0x120
     [   50.977702]  rxe_requester+0xe3b/0x10b0
     [   50.978881]  ? ip_local_deliver_finish+0xd1/0xe0
     [   50.980260]  rxe_do_task+0x85/0x100
     [   50.981386]  rxe_run_task+0x2f/0x40
     [   50.982470]  rxe_post_send+0x51a/0x550
     [   50.983591]  nvmet_rdma_queue_response+0x10a/0x170
     [   50.985024]  __nvmet_req_complete+0x95/0xa0
     [   50.986287]  nvmet_req_complete+0x15/0x60
     [   50.987469]  nvmet_bio_done+0x2d/0x40
     [   50.988564]  bio_endio+0x12c/0x140
     [   50.989654]  blk_update_request+0x185/0x2a0
     [   50.990947]  blk_mq_end_request+0x1e/0x80
     [   50.991997]  nvme_complete_rq+0x1cc/0x1e0
     [   50.993171]  nvme_pci_complete_rq+0x117/0x120
     [   50.994355]  __blk_mq_complete_request+0x15e/0x180
     [   50.995988]  blk_mq_complete_request+0x6f/0xa0
     [   50.997304]  nvme_process_cq+0xe0/0x1b0
     [   50.998494]  nvme_irq+0x28/0x50
     [   50.999572]  __handle_irq_event_percpu+0xa2/0x1c0
     [   51.000986]  handle_irq_event_percpu+0x32/0x80
     [   51.002356]  handle_irq_event+0x3c/0x60
     [   51.003463]  handle_edge_irq+0x1c9/0x200
     [   51.004473]  handle_irq+0x23/0x30
     [   51.005363]  do_IRQ+0x46/0xd0
     [   51.006182]  common_interrupt+0xf/0xf
     [   51.007129]  </IRQ>

     2. Work must always be offloaded to tasklet for rxe_post_send_kernel()
     when using NVMEoF in order to solve lock ordering between 
neigh->ha_lock
     seqlock and the nvme queue lock:

     [   77.833783]  Possible interrupt unsafe locking scenario:
     [   77.833783]
     [   77.835831]        CPU0                    CPU1
     [   77.837129]        ----                    ----
     [   77.838313]   lock(&(&n->ha_lock)->seqcount);
     [   77.839550]                                local_irq_disable();
     [   77.841377] 
lock(&(&nvmeq->q_lock)->rlock);
     [   77.843222] 
lock(&(&n->ha_lock)->seqcount);
     [   77.845178]   <Interrupt>
     [   77.846298]     lock(&(&nvmeq->q_lock)->rlock);
     [   77.847986]
     [   77.847986]  *** DEADLOCK ***

     3. Same goes for the lock ordering between sch->q.lock and nvme 
queue lock:

     [   47.634271]  Possible interrupt unsafe locking scenario:
     [   47.634271]
     [   47.636452]        CPU0                    CPU1
     [   47.637861]        ----                    ----
     [   47.639285]   lock(&(&sch->q.lock)->rlock);
     [   47.640654]                                local_irq_disable();
     [   47.642451] 
lock(&(&nvmeq->q_lock)->rlock);
     [   47.644521] 
lock(&(&sch->q.lock)->rlock);
     [   47.646480]   <Interrupt>
     [   47.647263]     lock(&(&nvmeq->q_lock)->rlock);
     [   47.648492]
     [   47.648492]  *** DEADLOCK ***

     Using NVMEoF after this patch seems to finally be stable, without it,
     rxe eventually deadlocks the whole system and causes RCU stalls.

     Signed-off-by: Alexandru Moise <00moses.alexander00 at gmail.com>
     Reviewed-by: Zhu Yanjun <yanjun.zhu at oracle.com>
     Signed-off-by: Doug Ledford <dledford at redhat.com>
--

^ permalink raw reply	[flat|nested] 6+ messages in thread

* NVMe over Fabrics I/O Error with RDMA/RXE
  2018-07-09 10:42 ` Sagi Grimberg
@ 2018-07-11  9:44   ` Bairen YI
  2018-07-11 10:37     ` Sagi Grimberg
  0 siblings, 1 reply; 6+ messages in thread
From: Bairen YI @ 2018-07-11  9:44 UTC (permalink / raw)


Hi Sagi,

Thanks.

> On 9 Jul 2018,@18:42, Sagi Grimberg <sagi@grimberg.me> wrote:
> 
> 
>> Hi all,
> 
> Hi Bairen,
> 
>> I am testing NVMe over Fabrics on a Linux 4.15 VM (latest Ubuntu 18.04 LTS, unmodified kernel), with an NVMe card and SoftRoCE (RXE) as transport.
>> It works just fine for the client connecting NVMeof target. However, it reports I/O error and timeout when read/write to the target over RXE.
> 
> Thanks for reporting,
> 
> Would it be possible to test with upstream kernel? I'm testing with
> 4.17.0 and I/O seems to work fine...
> 
> I suspect that probably some stable fixes need to be taken to 4.15.y?
> 
> What is your exact kernel version?

The exact kernel version is 4.15.0-23.25 as in 

https://packages.ubuntu.com/bionic/linux-image-4.15.0-23-generic

I tried again with the latest Debian unstable (kernel version 4.16.5) and the symptoms remain the same.

Dmesg: (with loopback RXE and null block NVMe device)

[ 6842.291260] rdma_rxe: loaded
[ 6842.304579] rdma_rxe: set rxe0 active
[ 6842.305832] rdma_rxe: added rxe0 to eth1
[ 6842.339164] Loading iSCSI transport class v2.0-870.
[ 6842.361792] iscsi: registered transport (iser)
[ 6842.383476] RPC: Registered named UNIX socket transport module.
[ 6842.385534] RPC: Registered udp transport module.
[ 6842.387039] RPC: Registered tcp transport module.
[ 6842.388838] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 6842.410984] RPC: Registered rdma transport module.
[ 6842.412464] RPC: Registered rdma backchannel transport module.
[185834.283674] rdma_rxe: already configured on eth1
[185851.343314] rdma_rxe: unloaded
[185858.563434] rdma_rxe: loaded
[185858.574959] rdma_rxe: set rxe0 active
[185858.576092] rdma_rxe: added rxe0 to eth1
[185865.446437] nvmet: adding nsid 10 to subsystem nvmeof
[185877.282562] null: module loaded
[185964.432807] nvmet_rdma: enabling port 1 (10.130.27.157:4420)
[185997.428294] nvmet: creating controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:eff231b2-d35d-4d55-9c1c-af9bc0bc7513.
[185997.432770] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.130.27.157:4420
[186019.995552] nvmet: creating controller 1 for subsystem nvmeof for NQN nqn.2014-08.org.nvmexpress:uuid:eff231b2-d35d-4d55-9c1c-af9bc0bc7513.
[186019.999684] nvme nvme0: creating 4 I/O queues.
[186020.004922] nvme nvme0: new ctrl: NQN "nvmeof", addr 10.130.27.157:4420
[186135.448783] nvme nvme0: I/O 1 QID 3 timeout, reset controller
[186135.450532] print_req_error: I/O error, dev nvme0c1n1, sector 158839936
[186135.452608] nvme nvme0: I/O 2 QID 3 timeout, reset controller
[186135.454420] print_req_error: I/O error, dev nvme0c1n1, sector 151934592
[186135.456783] nvme nvme0: I/O 3 QID 3 timeout, reset controller
[186135.458578] print_req_error: I/O error, dev nvme0c1n1, sector 427118592
[186135.460564] nvme nvme0: I/O 4 QID 3 timeout, reset controller
[186135.462139] print_req_error: I/O error, dev nvme0c1n1, sector 208058112
[186135.463996] nvme nvme0: I/O 5 QID 3 timeout, reset controller
[186135.465744] print_req_error: I/O error, dev nvme0c1n1, sector 202496256
[186135.467683] nvme nvme0: I/O 6 QID 3 timeout, reset controller
[186135.469431] print_req_error: I/O error, dev nvme0c1n1, sector 220602880
[186135.471460] nvme nvme0: I/O 7 QID 3 timeout, reset controller
[186135.473098] print_req_error: I/O error, dev nvme0c1n1, sector 184964352
[186135.474866] nvme nvme0: I/O 8 QID 3 timeout, reset controller
[186135.476329] print_req_error: I/O error, dev nvme0c1n1, sector 181691264
[186135.478304] nvme nvme0: I/O 9 QID 3 timeout, reset controller
[186135.480063] print_req_error: I/O error, dev nvme0c1n1, sector 36848384
[186135.482098] nvme nvme0: I/O 10 QID 3 timeout, reset controller
[186135.485114] print_req_error: I/O error, dev nvme0c1n1, sector 501229312
[186135.487968] nvme nvme0: I/O 11 QID 3 timeout, reset controller
[186135.490272] nvme nvme0: I/O 12 QID 3 timeout, reset controller
[186135.492702] nvme nvme0: I/O 13 QID 3 timeout, reset controller
[186135.495424] nvme nvme0: I/O 14 QID 3 timeout, reset controller
[186135.497818] nvme nvme0: I/O 15 QID 3 timeout, reset controller
[186135.501531] nvme nvme0: I/O 16 QID 3 timeout, reset controller
[186135.504271] nvme nvme0: I/O 17 QID 3 timeout, reset controller
[186135.507185] nvme nvme0: I/O 18 QID 3 timeout, reset controller
[186135.509768] nvme nvme0: I/O 19 QID 3 timeout, reset controller
[186135.512342] nvme nvme0: I/O 20 QID 3 timeout, reset controller
[186135.515511] nvme nvme0: I/O 21 QID 3 timeout, reset controller
[186135.518412] nvme nvme0: I/O 22 QID 3 timeout, reset controller
[186135.521632] nvme nvme0: I/O 23 QID 3 timeout, reset controller
[186135.524941] nvme nvme0: I/O 24 QID 3 timeout, reset controller
[186135.527899] nvme nvme0: I/O 25 QID 3 timeout, reset controller
[186135.530993] nvme nvme0: I/O 26 QID 3 timeout, reset controller
[186135.533780] nvme nvme0: I/O 121 QID 3 timeout, reset controller
[186135.536581] nvme nvme0: I/O 122 QID 3 timeout, reset controller
[186135.539624] nvme nvme0: I/O 123 QID 3 timeout, reset controller
[186135.542805] nvme nvme0: I/O 124 QID 3 timeout, reset controller
[186135.545982] nvme nvme0: I/O 125 QID 3 timeout, reset controller
[186135.548889] nvme nvme0: I/O 126 QID 3 timeout, reset controller
[186135.551095] nvme nvme0: I/O 1 QID 4 timeout, reset controller
[186135.553682] nvme nvme0: I/O 2 QID 4 timeout, reset controller
[186135.556214] nvme nvme0: I/O 3 QID 4 timeout, reset controller
[186135.558771] nvme nvme0: I/O 4 QID 4 timeout, reset controller
[186135.561007] nvme nvme0: I/O 5 QID 4 timeout, reset controller
[186135.563897] nvme nvme0: I/O 6 QID 4 timeout, reset controller
[186135.566642] nvme nvme0: I/O 7 QID 4 timeout, reset controller
[186135.569510] nvme nvme0: I/O 8 QID 4 timeout, reset controller
[186135.572048] nvme nvme0: I/O 9 QID 4 timeout, reset controller
[186135.574628] nvme nvme0: I/O 10 QID 4 timeout, reset controller
[186135.576829] nvme nvme0: I/O 11 QID 4 timeout, reset controller
[186135.579154] nvme nvme0: I/O 12 QID 4 timeout, reset controller
[186135.581375] nvme nvme0: I/O 13 QID 4 timeout, reset controller
[186135.583684] nvme nvme0: I/O 14 QID 4 timeout, reset controller
[186135.585642] nvme nvme0: I/O 15 QID 4 timeout, reset controller
[186135.587936] nvme nvme0: I/O 16 QID 4 timeout, reset controller
[186135.590076] nvme nvme0: I/O 17 QID 4 timeout, reset controller
[186135.592338] nvme nvme0: I/O 18 QID 4 timeout, reset controller
[186135.594644] nvme nvme0: I/O 19 QID 4 timeout, reset controller
[186135.597330] nvme nvme0: I/O 20 QID 4 timeout, reset controller
[186135.599994] nvme nvme0: I/O 21 QID 4 timeout, reset controller
[186135.602204] nvme nvme0: I/O 22 QID 4 timeout, reset controller
[186135.604896] nvme nvme0: I/O 23 QID 4 timeout, reset controller
[186135.607214] nvme nvme0: I/O 24 QID 4 timeout, reset controller
[186135.609274] nvme nvme0: I/O 25 QID 4 timeout, reset controller
[186135.611314] nvme nvme0: I/O 26 QID 4 timeout, reset controller
[186135.613598] nvme nvme0: I/O 27 QID 4 timeout, reset controller
[186135.615549] nvme nvme0: I/O 28 QID 4 timeout, reset controller
[186135.617562] nvme nvme0: I/O 29 QID 4 timeout, reset controller
[186135.621237] nvme nvme0: I/O 30 QID 4 timeout, reset controller
[186135.623435] nvme nvme0: I/O 31 QID 4 timeout, reset controller
[186135.625352] nvme nvme0: I/O 32 QID 4 timeout, reset controller
[186135.653786] nvme nvme0: Reconnecting in 10 seconds...
[186145.897284] nvmet: creating controller 1 for subsystem nvmeof for NQN nqn.2014-08.org.nvmexpress:uuid:eff231b2-d35d-4d55-9c1c-af9bc0bc7513.
[186145.905109] nvme nvme0: creating 4 I/O queues.
[186145.911808] nvme nvme0: Successfully reconnected (1 attempts)
[186145.944882] nvme0c1n1: detected capacity change from 0 to 268435456000

I will try the upstream mainline kernel (4.17) in the next few days.

> 
> The difference from linux-stable-4.15.y is:
> --
> bb42f87e2924 IB/rxe: avoid unnecessary export
> da2f3b286a39 IB/rxe: avoid calling WARN_ON_ONCE twice
> e02637e97d68 IB: Fix RDMA_RXE and INFINIBAND_RDMAVT dependencies for DMA_VIRT_OPS
> f5e27a203f4c Merge branch 'k.o/for-rc' into k.o/wip/dl-for-next
> 1661d3b0e218 nvmet,rxe: defer ip datagram sending to tasklet
> ecb238f6a7f3 IB/cxgb4: use skb_put_zero()/__skb_put_zero
> 9fd4350ba895 IB/rxe: avoid double kfree_skb
> 2da36d44a9d5 IB/rxe: add RXE_START_MASK for rxe_opcode IB_OPCODE_RC_SEND_ONLY_INV
> e12ee8ce5143 IB/rxe: remove unused function variable
> 0dff463a6a86 IB/rxe: change rxe_set_mtu function type to void
> 10c47d560603 IB/rxe: Change rxe_rcv to return void
> fe896ceb5772 IB/rxe: replace refcount_inc with skb_get
> 2e47350789eb IB/rxe: optimize the function duplicate_request
> 8f1a72c815cf IB/rxe: make rxe_release_udp_tunnel static
> 76be04500be2 IB/rxe: avoid export symbols
> 0f02ba7ed16a IB/rxe: make the variable static
> efc365e7290d IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
> 39e00b6cf658 IB/rxe: Removed GID add/del dummy routines
> 414448d249d8 RDMA: Use ib_gid_attr during GID modification
> 3e44e0ee0893 IB/providers: Avoid null netdev check for RoCE
> 0e1f9b924471 RDMA/providers: Simplify query_gid callback of RoCE providers
> 99dae690255e IB/rxe: optimize mcast recv process
> f2e9bfac13c9 RDMA/rxe: Fix uABI structure layouts for 32/64 compat
> 0ede73bc012c IB/uverbs: Extend uverbs_ioctl header with driver_id
> 0c43ab371bcb RDMA/rxe: Use structs to describe the uABI instead of opencoding
> b92ec0fe3224 RDMA/rxe: Get rid of confusing udata parameter to rxe_cq_chk_attr
> 43c9fc509fa5 rdma_rxe: make rxe work over 802.1q VLAN devices
> befd8d98f230 IB/rxe: change the function rxe_init_device_param type
> 31f1bd14cbfe IB/rxe: remove unnecessary rxe in rxe_send
> 86af61764151 IB/rxe: remove unnecessary skb_clone
> a6544a624c3f RDMA/rxe: Fix an out-of-bounds read
> c33bab622d47 IB/rxe: Remove unused variable (char *rxe_qp_state_name[])
> 7061f28d8a2f rxe: Do not use 'struct sockaddr' in a uapi header
> 316663c6bfeb IB/rxe: remove redudant parameter in rxe_av_fill_ip_info
> 45a290f8f6c6 IB/rxe: change the function rxe_av_fill_ip_info to void
> a402dc445dd6 IB/rxe: change the function to void from int
> 9c96f3d4ddba IB/rxe: remove unnecessary parameter in rxe_av_to_attr
> ca3d9feee6a9 IB/rxe: change the function to void from int
> 1a241db19ca9 IB/rxe: remove redudant parameter in function
> 0812ed132178 IB/rxe: Change RDMA_RXE kconfig to use select
> --
> 
> Possible suspect might be:
> --
> commit 2da36d44a9d54a2c6e1f8da1f7ccc26b0bc6cfec
> Author: Jianchao Wang <jianchao.w.wang at oracle.com>
> Date:   Thu Apr 26 11:52:39 2018 +0800
> 
>    IB/rxe: add RXE_START_MASK for rxe_opcode IB_OPCODE_RC_SEND_ONLY_INV
> 
>    w/o RXE_START_MASK, the last_psn of IB_OPCODE_RC_SEND_ONLY_INV
>    will not be updated in update_wqe_psn, and the corresponding
>    wqe will not be acked in rxe_completer due to its last_psn is
>    zero. Finally, the other wqe will also not be able to be acked,
>    because the wqe of IB_OPCODE_RC_SEND_ONLY_INV with last_psn 0
>    is still there. This causes large amount of io timeout when
>    nvmeof is over rxe.
> 
>    Add RXE_START_MASK for IB_OPCODE_RC_SEND_ONLY_INV to fix this.
> 
>    Signed-off-by: Jianchao Wang <jianchao.w.wang at oracle.com>
>    Reviewed-by: Zhu Yanjun <yanjun.zhu at oracle.com>
>    Signed-off-by: Doug Ledford <dledford at redhat.com>
> --
> 
> Another useful stable candidate would be:
> --
> commit 1661d3b0e2183ce90f6611641c350a5aa02aaa80
> Author: Alexandru Moise <00moses.alexander00 at gmail.com>
> Date:   Tue May 8 11:02:02 2018 +0200
> 
>    nvmet,rxe: defer ip datagram sending to tasklet
> 
>    This addresses 3 separate problems:
> 
>    1. When using NVME over Fabrics we may end up sending IP
>    packets in interrupt context, we should defer this work
>    to a tasklet.
> 
>    [   50.939957] WARNING: CPU: 3 PID: 0 at kernel/softirq.c:161 __local_bh_enable_ip+0x1f/0xa0
>    [   50.942602] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G        W         4.17.0-rc3-ARCH+ #104
>    [   50.945466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
>    [   50.948163] RIP: 0010:__local_bh_enable_ip+0x1f/0xa0
>    [   50.949631] RSP: 0018:ffff88009c183900 EFLAGS: 00010006
>    [   50.951029] RAX: 0000000080010403 RBX: 0000000000000200 RCX: 0000000000000001
>    [   50.952636] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffff817e04ec
>    [   50.954278] RBP: ffff88009c183910 R08: 0000000000000001 R09: 0000000000000614
>    [   50.956000] R10: ffffea00021d5500 R11: 0000000000000001 R12: ffffffff817e04ec
>    [   50.957779] R13: 0000000000000000 R14: ffff88009566f400 R15: ffff8800956c7000
>    [   50.959402] FS:  0000000000000000(0000) GS:ffff88009c180000(0000) knlGS:0000000000000000
>    [   50.961552] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>    [   50.963798] CR2: 000055c4ec0ccac0 CR3: 0000000002209001 CR4: 00000000000606e0
>    [   50.966121] Call Trace:
>    [   50.966845]  <IRQ>
>    [   50.967497]  __dev_queue_xmit+0x62d/0x690
>    [   50.968722]  dev_queue_xmit+0x10/0x20
>    [   50.969894]  neigh_resolve_output+0x173/0x190
>    [   50.971244]  ip_finish_output2+0x2b8/0x370
>    [   50.972527]  ip_finish_output+0x1d2/0x220
>    [   50.973785]  ? ip_finish_output+0x1d2/0x220
>    [   50.975010]  ip_output+0xd4/0x100
>    [   50.975903]  ip_local_out+0x3b/0x50
>    [   50.976823]  rxe_send+0x74/0x120
>    [   50.977702]  rxe_requester+0xe3b/0x10b0
>    [   50.978881]  ? ip_local_deliver_finish+0xd1/0xe0
>    [   50.980260]  rxe_do_task+0x85/0x100
>    [   50.981386]  rxe_run_task+0x2f/0x40
>    [   50.982470]  rxe_post_send+0x51a/0x550
>    [   50.983591]  nvmet_rdma_queue_response+0x10a/0x170
>    [   50.985024]  __nvmet_req_complete+0x95/0xa0
>    [   50.986287]  nvmet_req_complete+0x15/0x60
>    [   50.987469]  nvmet_bio_done+0x2d/0x40
>    [   50.988564]  bio_endio+0x12c/0x140
>    [   50.989654]  blk_update_request+0x185/0x2a0
>    [   50.990947]  blk_mq_end_request+0x1e/0x80
>    [   50.991997]  nvme_complete_rq+0x1cc/0x1e0
>    [   50.993171]  nvme_pci_complete_rq+0x117/0x120
>    [   50.994355]  __blk_mq_complete_request+0x15e/0x180
>    [   50.995988]  blk_mq_complete_request+0x6f/0xa0
>    [   50.997304]  nvme_process_cq+0xe0/0x1b0
>    [   50.998494]  nvme_irq+0x28/0x50
>    [   50.999572]  __handle_irq_event_percpu+0xa2/0x1c0
>    [   51.000986]  handle_irq_event_percpu+0x32/0x80
>    [   51.002356]  handle_irq_event+0x3c/0x60
>    [   51.003463]  handle_edge_irq+0x1c9/0x200
>    [   51.004473]  handle_irq+0x23/0x30
>    [   51.005363]  do_IRQ+0x46/0xd0
>    [   51.006182]  common_interrupt+0xf/0xf
>    [   51.007129]  </IRQ>
> 
>    2. Work must always be offloaded to tasklet for rxe_post_send_kernel()
>    when using NVMEoF in order to solve lock ordering between neigh->ha_lock
>    seqlock and the nvme queue lock:
> 
>    [   77.833783]  Possible interrupt unsafe locking scenario:
>    [   77.833783]
>    [   77.835831]        CPU0                    CPU1
>    [   77.837129]        ----                    ----
>    [   77.838313]   lock(&(&n->ha_lock)->seqcount);
>    [   77.839550]                                local_irq_disable();
>    [   77.841377] lock(&(&nvmeq->q_lock)->rlock);
>    [   77.843222] lock(&(&n->ha_lock)->seqcount);
>    [   77.845178]   <Interrupt>
>    [   77.846298]     lock(&(&nvmeq->q_lock)->rlock);
>    [   77.847986]
>    [   77.847986]  *** DEADLOCK ***
> 
>    3. Same goes for the lock ordering between sch->q.lock and nvme queue lock:
> 
>    [   47.634271]  Possible interrupt unsafe locking scenario:
>    [   47.634271]
>    [   47.636452]        CPU0                    CPU1
>    [   47.637861]        ----                    ----
>    [   47.639285]   lock(&(&sch->q.lock)->rlock);
>    [   47.640654]                                local_irq_disable();
>    [   47.642451] lock(&(&nvmeq->q_lock)->rlock);
>    [   47.644521] lock(&(&sch->q.lock)->rlock);
>    [   47.646480]   <Interrupt>
>    [   47.647263]     lock(&(&nvmeq->q_lock)->rlock);
>    [   47.648492]
>    [   47.648492]  *** DEADLOCK ***
> 
>    Using NVMEoF after this patch seems to finally be stable, without it,
>    rxe eventually deadlocks the whole system and causes RCU stalls.
> 
>    Signed-off-by: Alexandru Moise <00moses.alexander00 at gmail.com>
>    Reviewed-by: Zhu Yanjun <yanjun.zhu at oracle.com>
>    Signed-off-by: Doug Ledford <dledford at redhat.com>
> --

^ permalink raw reply	[flat|nested] 6+ messages in thread

* NVMe over Fabrics I/O Error with RDMA/RXE
  2018-07-11  9:44   ` Bairen YI
@ 2018-07-11 10:37     ` Sagi Grimberg
  2018-07-12  5:54       ` Bairen YI
  0 siblings, 1 reply; 6+ messages in thread
From: Sagi Grimberg @ 2018-07-11 10:37 UTC (permalink / raw)



> The exact kernel version is 4.15.0-23.25 as in
> 
> https://packages.ubuntu.com/bionic/linux-image-4.15.0-23-generic
> 
> I tried again with the latest Debian unstable (kernel version 4.16.5) and the symptoms remain the same.

Looking at linux-stable, looks like the suspected missing commit [1]
entered in tag v4.16.17, so I'm not sure it exists in your tested
kernel, Not sure where I can access the source to verify.

[1]
>> Possible suspect might be:
>> --
>> commit 2da36d44a9d54a2c6e1f8da1f7ccc26b0bc6cfec
>> Author: Jianchao Wang <jianchao.w.wang at oracle.com>
>> Date:   Thu Apr 26 11:52:39 2018 +0800
>>
>>     IB/rxe: add RXE_START_MASK for rxe_opcode IB_OPCODE_RC_SEND_ONLY_INV
>>
>>     w/o RXE_START_MASK, the last_psn of IB_OPCODE_RC_SEND_ONLY_INV
>>     will not be updated in update_wqe_psn, and the corresponding
>>     wqe will not be acked in rxe_completer due to its last_psn is
>>     zero. Finally, the other wqe will also not be able to be acked,
>>     because the wqe of IB_OPCODE_RC_SEND_ONLY_INV with last_psn 0
>>     is still there. This causes large amount of io timeout when
>>     nvmeof is over rxe.
>>
>>     Add RXE_START_MASK for IB_OPCODE_RC_SEND_ONLY_INV to fix this.
>>
>>     Signed-off-by: Jianchao Wang <jianchao.w.wang at oracle.com>
>>     Reviewed-by: Zhu Yanjun <yanjun.zhu at oracle.com>
>>     Signed-off-by: Doug Ledford <dledford at redhat.com>
>> --

^ permalink raw reply	[flat|nested] 6+ messages in thread

* NVMe over Fabrics I/O Error with RDMA/RXE
  2018-07-11 10:37     ` Sagi Grimberg
@ 2018-07-12  5:54       ` Bairen YI
  0 siblings, 0 replies; 6+ messages in thread
From: Bairen YI @ 2018-07-12  5:54 UTC (permalink / raw)


Hi Sagi,

Thanks! It works with kernel version 4.17.3-1.

$ uname -v
#1 SMP Debian 4.17.3-1 (2018-07-02)

Best,
Bairen

> On 11 Jul 2018,@18:37, Sagi Grimberg <sagi@grimberg.me> wrote:
> 
> 
>> The exact kernel version is 4.15.0-23.25 as in
>> https://packages.ubuntu.com/bionic/linux-image-4.15.0-23-generic
>> I tried again with the latest Debian unstable (kernel version 4.16.5) and the symptoms remain the same.
> 
> Looking at linux-stable, looks like the suspected missing commit [1]
> entered in tag v4.16.17, so I'm not sure it exists in your tested
> kernel, Not sure where I can access the source to verify.
> 
> [1]
>>> Possible suspect might be:
>>> --
>>> commit 2da36d44a9d54a2c6e1f8da1f7ccc26b0bc6cfec
>>> Author: Jianchao Wang <jianchao.w.wang at oracle.com>
>>> Date:   Thu Apr 26 11:52:39 2018 +0800
>>> 
>>>    IB/rxe: add RXE_START_MASK for rxe_opcode IB_OPCODE_RC_SEND_ONLY_INV
>>> 
>>>    w/o RXE_START_MASK, the last_psn of IB_OPCODE_RC_SEND_ONLY_INV
>>>    will not be updated in update_wqe_psn, and the corresponding
>>>    wqe will not be acked in rxe_completer due to its last_psn is
>>>    zero. Finally, the other wqe will also not be able to be acked,
>>>    because the wqe of IB_OPCODE_RC_SEND_ONLY_INV with last_psn 0
>>>    is still there. This causes large amount of io timeout when
>>>    nvmeof is over rxe.
>>> 
>>>    Add RXE_START_MASK for IB_OPCODE_RC_SEND_ONLY_INV to fix this.
>>> 
>>>    Signed-off-by: Jianchao Wang <jianchao.w.wang at oracle.com>
>>>    Reviewed-by: Zhu Yanjun <yanjun.zhu at oracle.com>
>>>    Signed-off-by: Doug Ledford <dledford at redhat.com>
>>> --

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-07-12  5:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-09  4:06 NVMe over Fabrics I/O Error with RDMA/RXE Bairen YI
2018-07-09  8:17 ` Leon Romanovsky
2018-07-09 10:42 ` Sagi Grimberg
2018-07-11  9:44   ` Bairen YI
2018-07-11 10:37     ` Sagi Grimberg
2018-07-12  5:54       ` Bairen YI

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.