* nvmf/rdma host crash during heavy load and keep alive recovery @ 2016-07-29 21:40 Steve Wise 2016-08-01 11:06 ` Christoph Hellwig 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-07-29 21:40 UTC (permalink / raw) Running many fio jobs on 10 NVMF/RDMA ram disks, and bringing down and back up the interfaces in a loop uncovers this crash. I'm not sure if this has been reported/fixed? I'm using the for-linus branch of linux-block + sagi's 5 patches on the host. What this test tickles is keep-alive recovery in the presence of heavy raw/direct IO. Before the crash there are logs of these logged, which is probably expected: [ 295.497642] blk_update_request: I/O error, dev nvme6n1, sector 21004 [ 295.497643] nvme nvme6: nvme_rdma_post_send failed with error code -22 [ 295.497644] blk_update_request: I/O error, dev nvme6n1, sector 10852 [ 295.497646] nvme nvme6: nvme_rdma_post_send failed with error code -22 [ 295.497647] blk_update_request: I/O error, dev nvme6n1, sector 32004 [ 295.497653] nvme nvme6: nvme_rdma_post_send failed with error code -22 [ 295.497655] nvme nvme6: nvme_rdma_post_send failed with error code -22 [ 295.497658] nvme nvme6: nvme_rdma_post_send failed with error code -22 [ 295.497660] nvme nvme6: nvme_rdma_post_send failed with error code -22 and these right before the crash: [ 295.591290] nvme nvme6: nvme_rdma_post_send failed with error code -22 [ 295.591291] nvme nvme6: Queueing INV WR for rkey 0x2a26eee failed (-22) [ 295.591316] nvme nvme6: nvme_rdma_post_send failed with error code -22 [ 295.591317] nvme nvme6: Queueing INV WR for rkey 0x2a26eef failed (-22) [ 295.591342] nvme nvme6: nvme_rdma_post_send failed with error code -22 The crash is a GPF because the qp has been freed, and due to debug kernel hacks, the qp struct memory is 0x6b6b6b6b... Any ideas? Here's the log: [ 295.642191] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC [ 295.642228] Modules linked in: nvme_rdma nvme_fabrics brd iw_cxgb4 ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 cxgb4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge 8021q mrp garp stp llc cachefiles fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad ocrdma be2net iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx5_ib mlx5_core mlx4_ib mlx4_en mlx4_core ib_mthca ib_core binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm irqbypass uinput iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr dm_mod i2c_i801 sg lpc_ich mfd_core mei_me mei nvme nvme_core igb dca ptp pps_core ipmi_si ipmi_msghandler wmi [ 295.642236] ext4(E) mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) libata(E) mgag200(E) ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E) i2c_core(E) [last unloaded: cxgb4] [ 295.642239] CPU: 8 PID: 18390 Comm: blkid Tainted: G E 4.7.0-block-for-linus-debug+ #11 [ 295.642240] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015 [ 295.642242] task: ffff880fb3418040 ti: ffff880fc06cc000 task.ti: ffff880fc06cc000 [ 295.642248] RIP: 0010:[<ffffffffa07fc083>] [<ffffffffa07fc083>] nvme_rdma_post_send+0x83/0xd0 [nvme_rdma] [ 295.642249] RSP: 0018:ffff880fc06cf7c8 EFLAGS: 00010286 [ 295.642250] RAX: 6b6b6b6b6b6b6b6b RBX: ffff880fc37da698 RCX: 0000000000000001 [ 295.642251] RDX: ffff880fc06cf7f0 RSI: ffff880fbf715d00 RDI: ffff880fc0afdfa8 [ 295.642252] RBP: ffff880fc06cf808 R08: ffff880fbf715d00 R09: 0000000000000000 [ 295.642253] R10: 0000000fb9f9b000 R11: ffff880fbf715d58 R12: ffff880fc37da698 [ 295.642254] R13: ffff880fbf715cb0 R14: ffff880fee6522a8 R15: ffff880fbf6d1138 [ 295.642255] FS: 00007f9abfe3e740(0000) GS:ffff881037000000(0000) knlGS:0000000000000000 [ 295.642256] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 295.642257] CR2: 0000003f691422e9 CR3: 0000000fcaab1000 CR4: 00000000000406e0 [ 295.642258] Stack: [ 295.642260] 0000000000000000 ffff880fbf715cb8 ffff880fbf715cd0 0000000200000001 [ 295.642262] ffff880f00000000 ffff880fbf715cb0 ffff880fbf715b40 ffff880fc37da698 [ 295.642264] ffff880fc06cf858 ffffffffa07fdc92 ffff880fbf715b40 ffff880fc0eac748 [ 295.642264] Call Trace: [ 295.642268] [<ffffffffa07fdc92>] nvme_rdma_queue_rq+0x172/0x280 [nvme_rdma] [ 295.642273] [<ffffffff81339e97>] blk_mq_make_request+0x2d7/0x560 [ 295.642277] [<ffffffff8119a5c5>] ? mempool_alloc_slab+0x15/0x20 [ 295.642280] [<ffffffff8132c6bc>] generic_make_request+0xfc/0x1d0 [ 295.642282] [<ffffffff8132c7f0>] submit_bio+0x60/0x130 [ 295.642286] [<ffffffff81264541>] submit_bh_wbc+0x161/0x1c0 [ 295.642288] [<ffffffff812645b3>] submit_bh+0x13/0x20 [ 295.642291] [<ffffffff81264eae>] block_read_full_page+0x20e/0x3e0 [ 295.642293] [<ffffffff81268c83>] ? __blkdev_get+0x193/0x420 [ 295.642295] [<ffffffff81266f60>] ? I_BDEV+0x20/0x20 [ 295.642297] [<ffffffff81199228>] ? pagecache_get_page+0x38/0x240 [ 295.642300] [<ffffffff811deae6>] ? page_add_file_rmap+0x56/0x160 [ 295.642304] [<ffffffff81111b3e>] ? is_module_text_address+0xe/0x20 [ 295.642307] [<ffffffff81267908>] blkdev_readpage+0x18/0x20 [ 295.642309] [<ffffffff81199b9a>] do_generic_file_read+0x20a/0x710 [ 295.642314] [<ffffffff8115566b>] ? rb_reserve_next_event+0xdb/0x230 [ 295.642317] [<ffffffff81040fbf>] ? save_stack_trace+0x2f/0x50 [ 295.642319] [<ffffffff81154baa>] ? rb_commit+0x10a/0x1a0 [ 295.642321] [<ffffffff8119a15c>] generic_file_read_iter+0xbc/0x110 [ 295.642323] [<ffffffff81154c64>] ? ring_buffer_unlock_commit+0x24/0xb0 [ 295.642326] [<ffffffff81267277>] blkdev_read_iter+0x37/0x40 [ 295.642329] [<ffffffff8122b37c>] __vfs_read+0xfc/0x120 [ 295.642331] [<ffffffff8122b44e>] vfs_read+0xae/0xf0 [ 295.642335] [<ffffffff812492a3>] ? __fdget+0x13/0x20 [ 295.642337] [<ffffffff8122bf56>] SyS_read+0x56/0xc0 [ 295.642341] [<ffffffff81003f5d>] do_syscall_64+0x7d/0x230 [ 295.642345] [<ffffffff8106f397>] ? do_page_fault+0x37/0x90 [ 295.642350] [<ffffffff816dbda1>] entry_SYSCALL64_slow_path+0x25/0x25 [ 295.642367] Code: 47 08 83 c0 01 a8 1f 88 47 08 74 5b 45 84 c9 75 56 4d 85 c0 74 5a 48 8d 45 c0 49 89 00 48 8b 7b 30 48 8d 55 e8 4c 89 c6 48 8b 07 <ff> 90 c8 01 00 00 85 c0 41 89 c4 74 23 48 8b 43 18 44 89 e1 48 [ 295.642371] RIP [<ffffffffa07fc083>] nvme_rdma_post_send+0x83/0xd0 [nvme_rdma] [ 295.642371] RSP <ffff880fc06cf7c8> crash> mod -s nvme_rdma MODULE NAME SIZE OBJECT FILE ffffffffa0801240 nvme_rdma 28672 /lib/modules/4.7.0-block-for-linus-debug+/kernel/drivers/nvme/host/nvme-rdma.ko crash> gdb list *nvme_rdma_post_send+0x83 0xffffffffa07fc083 is in nvme_rdma_post_send (include/rdma/ib_verbs.h:2619). 2614 */ 2615 static inline int ib_post_send(struct ib_qp *qp, 2616 struct ib_send_wr *send_wr, 2617 struct ib_send_wr **bad_send_wr) 2618 { 2619 return qp->device->post_send(qp, send_wr, bad_send_wr); 2620 } 2621 2622 /** 2623 * ib_post_recv - Posts a list of work requests to the receive queue of crash> ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-07-29 21:40 nvmf/rdma host crash during heavy load and keep alive recovery Steve Wise @ 2016-08-01 11:06 ` Christoph Hellwig 2016-08-01 14:26 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Christoph Hellwig @ 2016-08-01 11:06 UTC (permalink / raw) On Fri, Jul 29, 2016@04:40:40PM -0500, Steve Wise wrote: > Running many fio jobs on 10 NVMF/RDMA ram disks, and bringing down and back up > the interfaces in a loop uncovers this crash. I'm not sure if this has been > reported/fixed? I'm using the for-linus branch of linux-block + sagi's 5 > patches on the host. > > What this test tickles is keep-alive recovery in the presence of heavy > raw/direct IO. Before the crash there are logs of these logged, which is > probably expected: With what fixes does this happen? This looks pretty similar to an issue you reported before. ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-08-01 11:06 ` Christoph Hellwig @ 2016-08-01 14:26 ` Steve Wise 2016-08-01 21:38 ` Steve Wise [not found] ` <015801d1ec3d$0ca07ea0$25e17be0$@opengridcomputing.com> 0 siblings, 2 replies; 53+ messages in thread From: Steve Wise @ 2016-08-01 14:26 UTC (permalink / raw) > On Fri, Jul 29, 2016@04:40:40PM -0500, Steve Wise wrote: > > Running many fio jobs on 10 NVMF/RDMA ram disks, and bringing down and back > up > > the interfaces in a loop uncovers this crash. I'm not sure if this has been > > reported/fixed? I'm using the for-linus branch of linux-block + sagi's 5 > > patches on the host. > > > > What this test tickles is keep-alive recovery in the presence of heavy > > raw/direct IO. Before the crash there are logs of these logged, which is > > probably expected: > > With what fixes does this happen? This looks pretty similar to an > issue you reported before. As I said, I'm using the for-linus branch of the linux-block repo (git://git.kernel.dk/linux-block) + sagi's 5 recent patches. So I should be using the latest and greatest, I think. This problem was originally seen on nvmf-all.3 as well. Perhaps I have reported this previously. But now I'm trying to fix it :) ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-08-01 14:26 ` Steve Wise @ 2016-08-01 21:38 ` Steve Wise [not found] ` <015801d1ec3d$0ca07ea0$25e17be0$@opengridcomputing.com> 1 sibling, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-08-01 21:38 UTC (permalink / raw) > > On Fri, Jul 29, 2016@04:40:40PM -0500, Steve Wise wrote: > > > Running many fio jobs on 10 NVMF/RDMA ram disks, and bringing down and > back > > up > > > the interfaces in a loop uncovers this crash. I'm not sure if this has been > > > reported/fixed? I'm using the for-linus branch of linux-block + sagi's 5 > > > patches on the host. > > > > > > What this test tickles is keep-alive recovery in the presence of heavy > > > raw/direct IO. Before the crash there are logs of these logged, which is > > > probably expected: > > > > With what fixes does this happen? This looks pretty similar to an > > issue you reported before. > > As I said, I'm using the for-linus branch of the linux-block repo > (git://git.kernel.dk/linux-block) + sagi's 5 recent patches. So I should be > using the latest and greatest, I think. This problem was originally seen on > nvmf-all.3 as well. Perhaps I have reported this previously. But now I'm > trying to fix it :) > I do have two different problem reports internally at Chelsio that both show the same signature. I found the other one :) For the 2nd problem report, there was no ifup/down to induce keep alive recovery. It just loads up 10 ram disks on a 64 core host/target pair in a similar manner, and after a while lots of nvme_rdma_post_send() errors are logged (probably due to a connection death) and then the crash. I'm still gathering info on that one, but it appears the qp again was freed somehow and then attempts to post to it cause the crash... Steve ^ permalink raw reply [flat|nested] 53+ messages in thread
[parent not found: <015801d1ec3d$0ca07ea0$25e17be0$@opengridcomputing.com>]
* nvmf/rdma host crash during heavy load and keep alive recovery [not found] ` <015801d1ec3d$0ca07ea0$25e17be0$@opengridcomputing.com> @ 2016-08-10 15:46 ` Steve Wise [not found] ` <010f01d1f31e$50c8cb40$f25a61c0$@opengridcomputing.com> 1 sibling, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-08-10 15:46 UTC (permalink / raw) Hey guys, I've rebased the nvmf-4.8-rc branch on top of 4.8-rc2 so I have the latest/gratest, and continued debugging this crash. I see: 0) 10 ram disks attached via nvmf/iw_cxgb4, and fio started on all 10 disks. This node has 8 cores, so that is 80 connections. 1) the cxgb4 interface brought down a few seconds later 2) kato fires on all connections 3) the interface is brought back up 8 seconds after #1 4) 10 seconds after #2 all the qps are destroyed 5) reconnects start happening 6) a blk request is executed and the nvme_rdma_request struct still has a pointer to one of the qps destroyed in 3 and whamo... I'm digging into the request cancel logic. Any ideas/help is greatly appreciated... Thanks, Steve. ^ permalink raw reply [flat|nested] 53+ messages in thread
[parent not found: <010f01d1f31e$50c8cb40$f25a61c0$@opengridcomputing.com>]
* nvmf/rdma host crash during heavy load and keep alive recovery [not found] ` <010f01d1f31e$50c8cb40$f25a61c0$@opengridcomputing.com> @ 2016-08-10 16:00 ` Steve Wise [not found] ` <013701d1f320$57b185d0$07149170$@opengridcomputing.com> 1 sibling, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-08-10 16:00 UTC (permalink / raw) > Hey guys, I've rebased the nvmf-4.8-rc branch on top of 4.8-rc2 so I have > the latest/gratest, and continued debugging this crash. I see: > > 0) 10 ram disks attached via nvmf/iw_cxgb4, and fio started on all 10 > disks. This node has 8 cores, so that is 80 connections. > 1) the cxgb4 interface brought down a few seconds later > 2) kato fires on all connections > 3) the interface is brought back up 8 seconds after #1 > 4) 10 seconds after #2 all the qps are destroyed > 5) reconnects start happening > 6) a blk request is executed and the nvme_rdma_request struct still has a > pointer to one of the qps destroyed in 3 and whamo... > > I'm digging into the request cancel logic. Any ideas/help is greatly > appreciated... > > Thanks, > > Steve. Here is the stack that crashed processing a blk request: crash> bt PID: 402 TASK: ffff880397968040 CPU: 0 COMMAND: "kworker/0:1H" #0 [ffff8803970f7800] machine_kexec at ffffffff8105fc40 #1 [ffff8803970f7870] __crash_kexec at ffffffff81116908 #2 [ffff8803970f7940] crash_kexec at ffffffff811169dd #3 [ffff8803970f7970] oops_end at ffffffff81032be6 #4 [ffff8803970f79a0] die at ffffffff810330db #5 [ffff8803970f79d0] do_general_protection at ffffffff81030144 #6 [ffff8803970f7a00] general_protection at ffffffff816e4ca8 [exception RIP: nvme_rdma_post_send+131] RIP: ffffffffa0414083 RSP: ffff8803970f7ab8 RFLAGS: 00010246 RAX: 6b6b6b6b6b6b6b6b RBX: ffff8802dd923598 RCX: 0000000000000002 RDX: ffff8803970f7ae0 RSI: ffff8803970f7ab8 RDI: ffff8802dd9fc518 RBP: ffff8803970f7af8 R8: ffff8803970f7ab8 R9: 0000000000000000 R10: 0000000000000000 R11: ffff8802dde6ef58 R12: ffff8802dd923598 R13: ffff8802dde6eeb0 R14: ffff880399f4c548 R15: ffff8802dde59db8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff8803970f7b00] nvme_rdma_queue_rq at ffffffffa0415c72 [nvme_rdma] #8 [ffff8803970f7b50] __blk_mq_run_hw_queue at ffffffff81338324 #9 [ffff8803970f7ca0] blk_mq_run_work_fn at ffffffff81338552 #10 [ffff8803970f7cb0] process_one_work at ffffffff810a1593 #11 [ffff8803970f7d90] worker_thread at ffffffff810a222d #12 [ffff8803970f7ec0] kthread at ffffffff810a6d6c #13 [ffff8803970f7f50] ret_from_fork at ffffffff816e2cbf Here is the nvme_rdma_request: crash> nvme_rdma_request ffff8802dde6eeb0 struct nvme_rdma_request { mr = 0xffff8802dde5c008, sqe = { cqe = { done = 0xffffffffa0414320 <nvme_rdma_send_done> }, data = 0xffff8802dde59db8, dma = 12312747448 }, sge = {{ addr = 12312747448, length = 64, lkey = 0 }, { addr = 12138727424, length = 2048, lkey = 0 }}, num_sge = 2, nents = 1, inline_data = true, need_inval = false, reg_wr = { wr = { next = 0x0, { wr_id = 0, wr_cqe = 0x0 }, sg_list = 0x0, num_sge = 0, opcode = IB_WR_RDMA_WRITE, send_flags = 0, ex = { imm_data = 0, invalidate_rkey = 0 } }, mr = 0x0, key = 0, access = 0 }, reg_cqe = { done = 0x0 }, queue = 0xffff8802dd923598, sg_table = { sgl = 0xffff8802dde6ef58, nents = 1, orig_nents = 1 }, first_sgl = 0xffff8802dde6ef58 } And here is the nvme_rdma_queue: crash> nvme_rdma_queue 0xffff8802dd923598 struct nvme_rdma_queue { rsp_ring = 0xffff8802dd968008, sig_count = 200 '\310', queue_size = 128, cmnd_capsule_len = 4160, ctrl = 0xffff8802dbd5d3d8, device = 0xffff880384ceb5e8, ib_cq = 0xffff8802dd9d2e68, qp = 0xffff8802dd9fc518, flags = 0, cm_id = 0xffff8802dd9f8008, cm_error = 0, cm_done = { done = 0, wait = { lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, task_list = { next = 0xffff8802dd9235f8, prev = 0xffff8802dd9235f8 } } } } And see here the ib_qp has been freed: crash> gdb x/8g 0xffff8802dd9fc518 0xffff8802dd9fc518: 0x6b6b6b6b6b6b6b6b 0x6b6b6b6b6b6b6b6b 0xffff8802dd9fc528: 0x6b6b6b6b6b6b6b6b 0x6b6b6b6b6b6b6b6b 0xffff8802dd9fc538: 0x6b6b6b6b6b6b6b6b 0x6b6b6b6b6b6b6b6b 0xffff8802dd9fc548: 0x6b6b6b6b6b6b6b6b 0x6b6b6b6b6b6b6b6b ^ permalink raw reply [flat|nested] 53+ messages in thread
[parent not found: <013701d1f320$57b185d0$07149170$@opengridcomputing.com>]
* nvmf/rdma host crash during heavy load and keep alive recovery [not found] ` <013701d1f320$57b185d0$07149170$@opengridcomputing.com> @ 2016-08-10 17:20 ` Steve Wise 2016-08-10 18:59 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-08-10 17:20 UTC (permalink / raw) > Here is the stack that crashed processing a blk request: > > crash> bt > PID: 402 TASK: ffff880397968040 CPU: 0 COMMAND: "kworker/0:1H" > #0 [ffff8803970f7800] machine_kexec at ffffffff8105fc40 > #1 [ffff8803970f7870] __crash_kexec at ffffffff81116908 > #2 [ffff8803970f7940] crash_kexec at ffffffff811169dd > #3 [ffff8803970f7970] oops_end at ffffffff81032be6 > #4 [ffff8803970f79a0] die at ffffffff810330db > #5 [ffff8803970f79d0] do_general_protection at ffffffff81030144 > #6 [ffff8803970f7a00] general_protection at ffffffff816e4ca8 > [exception RIP: nvme_rdma_post_send+131] > RIP: ffffffffa0414083 RSP: ffff8803970f7ab8 RFLAGS: 00010246 > RAX: 6b6b6b6b6b6b6b6b RBX: ffff8802dd923598 RCX: 0000000000000002 > RDX: ffff8803970f7ae0 RSI: ffff8803970f7ab8 RDI: ffff8802dd9fc518 > RBP: ffff8803970f7af8 R8: ffff8803970f7ab8 R9: 0000000000000000 > R10: 0000000000000000 R11: ffff8802dde6ef58 R12: ffff8802dd923598 > R13: ffff8802dde6eeb0 R14: ffff880399f4c548 R15: ffff8802dde59db8 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #7 [ffff8803970f7b00] nvme_rdma_queue_rq at ffffffffa0415c72 [nvme_rdma] > #8 [ffff8803970f7b50] __blk_mq_run_hw_queue at ffffffff81338324 > #9 [ffff8803970f7ca0] blk_mq_run_work_fn at ffffffff81338552 > #10 [ffff8803970f7cb0] process_one_work at ffffffff810a1593 > #11 [ffff8803970f7d90] worker_thread at ffffffff810a222d > #12 [ffff8803970f7ec0] kthread at ffffffff810a6d6c > #13 [ffff8803970f7f50] ret_from_fork at ffffffff816e2cbf > > Here is the nvme_rdma_request: > > crash> nvme_rdma_request ffff8802dde6eeb0 > struct nvme_rdma_request { > mr = 0xffff8802dde5c008, > sqe = { > cqe = { > done = 0xffffffffa0414320 <nvme_rdma_send_done> > }, > data = 0xffff8802dde59db8, > dma = 12312747448 > }, > sge = {{ > addr = 12312747448, > length = 64, > lkey = 0 > }, { > addr = 12138727424, > length = 2048, > lkey = 0 > }}, > num_sge = 2, > nents = 1, > inline_data = true, > need_inval = false, > reg_wr = { > wr = { > next = 0x0, > { > wr_id = 0, > wr_cqe = 0x0 > }, > sg_list = 0x0, > num_sge = 0, > opcode = IB_WR_RDMA_WRITE, > send_flags = 0, > ex = { > imm_data = 0, > invalidate_rkey = 0 > } > }, > mr = 0x0, > key = 0, > access = 0 > }, > reg_cqe = { > done = 0x0 > }, > queue = 0xffff8802dd923598, > sg_table = { > sgl = 0xffff8802dde6ef58, > nents = 1, > orig_nents = 1 > }, > first_sgl = 0xffff8802dde6ef58 > } > > And here is the nvme_rdma_queue: > > crash> nvme_rdma_queue 0xffff8802dd923598 > struct nvme_rdma_queue { > rsp_ring = 0xffff8802dd968008, > sig_count = 200 '\310', > queue_size = 128, > cmnd_capsule_len = 4160, > ctrl = 0xffff8802dbd5d3d8, > device = 0xffff880384ceb5e8, > ib_cq = 0xffff8802dd9d2e68, > qp = 0xffff8802dd9fc518, > flags = 0, > cm_id = 0xffff8802dd9f8008, > cm_error = 0, > cm_done = { > done = 0, > wait = { > lock = { > { > rlock = { > raw_lock = { > val = { > counter = 0 > } > } > } > } > }, > task_list = { > next = 0xffff8802dd9235f8, > prev = 0xffff8802dd9235f8 > } > } > } > } > > And see here the ib_qp has been freed: > > crash> gdb x/8g 0xffff8802dd9fc518 > 0xffff8802dd9fc518: 0x6b6b6b6b6b6b6b6b 0x6b6b6b6b6b6b6b6b > 0xffff8802dd9fc528: 0x6b6b6b6b6b6b6b6b 0x6b6b6b6b6b6b6b6b > 0xffff8802dd9fc538: 0x6b6b6b6b6b6b6b6b 0x6b6b6b6b6b6b6b6b > 0xffff8802dd9fc548: 0x6b6b6b6b6b6b6b6b 0x6b6b6b6b6b6b6b6b The nvme_rdma_ctrl queue associated with the request is in RECONNECTING state: ctrl = { state = NVME_CTRL_RECONNECTING, lock = { So it should not be posting SQ WRs... ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-08-10 17:20 ` Steve Wise @ 2016-08-10 18:59 ` Steve Wise 2016-08-11 6:27 ` Sagi Grimberg 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-08-10 18:59 UTC (permalink / raw) > The nvme_rdma_ctrl queue associated with the request is in RECONNECTING state: > > ctrl = { > state = NVME_CTRL_RECONNECTING, > lock = { > > So it should not be posting SQ WRs... kato kicks error recovery, nvme_rdma_error_recovery_work(), which calls nvme_cancel_request() on each request. nvme_cancel_request() sets req->errors to NVME_SC_ABORT_REQ. It then completes the request which ends up at nvme_rdma_complete_rq() which queues it for retry: ... if (unlikely(rq->errors)) { if (nvme_req_needs_retry(rq, rq->errors)) { nvme_requeue_req(rq); return; } if (rq->cmd_type == REQ_TYPE_DRV_PRIV) error = rq->errors; else error = nvme_error_status(rq->errors); } ... The retry will end up at nvme_rdma_queue_rq() which will try and post a send wr to a freed qp... Should the canceled requests actually OR in bit NVME_SC_DNR? That is only done in nvme_cancel_request() if the blk queue is dying: ... status = NVME_SC_ABORT_REQ; if (blk_queue_dying(req->q)) status |= NVME_SC_DNR; ... Sagi, please put on your KATO hat and help! :) Thanks, Steve. ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-08-10 18:59 ` Steve Wise @ 2016-08-11 6:27 ` Sagi Grimberg 2016-08-11 13:58 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Sagi Grimberg @ 2016-08-11 6:27 UTC (permalink / raw) On 10/08/16 21:59, Steve Wise wrote: > >> The nvme_rdma_ctrl queue associated with the request is in RECONNECTING state: >> >> ctrl = { >> state = NVME_CTRL_RECONNECTING, >> lock = { >> >> So it should not be posting SQ WRs... > > kato kicks error recovery, nvme_rdma_error_recovery_work(), which calls > nvme_cancel_request() on each request. nvme_cancel_request() sets req->errors > to NVME_SC_ABORT_REQ. It then completes the request which ends up at > nvme_rdma_complete_rq() which queues it for retry: > ... > if (unlikely(rq->errors)) { > if (nvme_req_needs_retry(rq, rq->errors)) { > nvme_requeue_req(rq); > return; > } > > if (rq->cmd_type == REQ_TYPE_DRV_PRIV) > error = rq->errors; > else > error = nvme_error_status(rq->errors); > } > ... > > The retry will end up at nvme_rdma_queue_rq() which will try and post a send wr > to a freed qp... > > Should the canceled requests actually OR in bit NVME_SC_DNR? That is only done > in nvme_cancel_request() if the blk queue is dying: the DNR bit should not be set normally, only when we either don't want to requeue or we can't. > > ... > status = NVME_SC_ABORT_REQ; > if (blk_queue_dying(req->q)) > status |= NVME_SC_DNR; > ... > couple of questions: 1. bringing down the interface means generating DEVICE_REMOVAL event? 2. keep-alive timeout expires means that nvme_rdma_timeout() invokes kicks error_recovery and set: rq->errors = NVME_SC_ABORT_REQ | NVME_SC_DNR So I'm not at all convinced that the keep-alive is the request that being re-issued. Did you verify that? ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-08-11 6:27 ` Sagi Grimberg @ 2016-08-11 13:58 ` Steve Wise 2016-08-11 14:19 ` Steve Wise 2016-08-11 14:40 ` Steve Wise 0 siblings, 2 replies; 53+ messages in thread From: Steve Wise @ 2016-08-11 13:58 UTC (permalink / raw) > >> The nvme_rdma_ctrl queue associated with the request is in RECONNECTING > state: > >> > >> ctrl = { > >> state = NVME_CTRL_RECONNECTING, > >> lock = { > >> > >> So it should not be posting SQ WRs... > > > > kato kicks error recovery, nvme_rdma_error_recovery_work(), which calls > > nvme_cancel_request() on each request. nvme_cancel_request() sets req->errors > > to NVME_SC_ABORT_REQ. It then completes the request which ends up at > > nvme_rdma_complete_rq() which queues it for retry: > > ... > > if (unlikely(rq->errors)) { > > if (nvme_req_needs_retry(rq, rq->errors)) { > > nvme_requeue_req(rq); > > return; > > } > > > > if (rq->cmd_type == REQ_TYPE_DRV_PRIV) > > error = rq->errors; > > else > > error = nvme_error_status(rq->errors); > > } > > ... > > > > The retry will end up at nvme_rdma_queue_rq() which will try and post a send wr > > to a freed qp... > > > > Should the canceled requests actually OR in bit NVME_SC_DNR? That is only > done > > in nvme_cancel_request() if the blk queue is dying: > > the DNR bit should not be set normally, only when we either don't want > to requeue or we can't. > > > > > ... > > status = NVME_SC_ABORT_REQ; > > if (blk_queue_dying(req->q)) > > status |= NVME_SC_DNR; > > ... > > > > couple of questions: > > 1. bringing down the interface means generating DEVICE_REMOVAL > event? > No. Just ifconfig ethX down; sleep 10; ifconfig ethX up. This simply causes the pending work requests to take longer to complete and kicks in the kato logic. > 2. keep-alive timeout expires means that nvme_rdma_timeout() invokes > kicks error_recovery and set: > rq->errors = NVME_SC_ABORT_REQ | NVME_SC_DNR > > So I'm not at all convinced that the keep-alive is the request that > being re-issued. Did you verify that? The request that caused the crash had rq->errors == NVME_SC_ABORT_REQ. I'm not sure that is always the case though. But this is very easy to reproduce, so I should be able to drill down and add any debug code you think might help. ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-08-11 13:58 ` Steve Wise @ 2016-08-11 14:19 ` Steve Wise 2016-08-11 14:40 ` Steve Wise 1 sibling, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-08-11 14:19 UTC (permalink / raw) > > > > the DNR bit should not be set normally, only when we either don't want > > to requeue or we can't. > > So when a request is requeued, when is it restarted? It is getting restarted on a controller that is in recovery mode and hasn't setup the new qp. So the nvme_rdma_queue struct associated with the request is pointing to a freed ib_pq... ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-08-11 13:58 ` Steve Wise 2016-08-11 14:19 ` Steve Wise @ 2016-08-11 14:40 ` Steve Wise 2016-08-11 15:53 ` Steve Wise [not found] ` <00fe01d1f3e8$8992b330$9cb81990$@opengridcomputing.com> 1 sibling, 2 replies; 53+ messages in thread From: Steve Wise @ 2016-08-11 14:40 UTC (permalink / raw) > > couple of questions: > > > > 1. bringing down the interface means generating DEVICE_REMOVAL > > event? > > > > No. Just ifconfig ethX down; sleep 10; ifconfig ethX up. This simply causes > the pending work requests to take longer to complete and kicks in the kato > logic. > > > > 2. keep-alive timeout expires means that nvme_rdma_timeout() invokes > > kicks error_recovery and set: > > rq->errors = NVME_SC_ABORT_REQ | NVME_SC_DNR > > > > So I'm not at all convinced that the keep-alive is the request that > > being re-issued. Did you verify that? > > The request that caused the crash had rq->errors == NVME_SC_ABORT_REQ. I'm > not > sure that is always the case though. But this is very easy to reproduce, so I > should be able to drill down and add any debug code you think might help. > Here is the nvme_rdma_request and request that caused the crash this time: crash> nvme_rdma_request ffff8801c7e3a5b0 struct nvme_rdma_request { mr = 0xffff8801c7e1d298, sqe = { cqe = { done = 0xffffffffa0853320 <nvme_rdma_send_done> }, data = 0xffff8801c7e1b2c8, dma = 7648424648 }, sge = {{ addr = 7648424648, length = 64, lkey = 0 }, { addr = 7814166528, length = 2048, lkey = 0 }}, num_sge = 2, nents = 1, inline_data = true, need_inval = false, reg_wr = { wr = { next = 0x0, { wr_id = 0, wr_cqe = 0x0 }, sg_list = 0x0, num_sge = 0, opcode = IB_WR_RDMA_WRITE, send_flags = 0, ex = { imm_data = 0, invalidate_rkey = 0 } }, mr = 0x0, key = 0, access = 0 }, reg_cqe = { done = 0x0 }, queue = 0xffff8801d6484038, sg_table = { sgl = 0xffff8801c7e3a658, nents = 1, orig_nents = 1 }, first_sgl = 0xffff8801c7e3a658 } crash> nvme_rdma_queue 0xffff8801d6484038 struct nvme_rdma_queue { rsp_ring = 0xffff8801ce699158, sig_count = 147 '\223', queue_size = 128, cmnd_capsule_len = 4160, ctrl = 0xffff8801d6308958, device = 0xffff8803836f44b8, ib_cq = 0xffff8801d65366b8, qp = 0xffff8801d6524518, flags = 0, cm_id = 0xffff8801d6525508, cm_error = 0, cm_done = { done = 0, wait = { lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, task_list = { next = 0xffff8801d6484098, prev = 0xffff8801d6484098 } } } } crash> gdb p ((struct request *)0xffff8801c7e3a5b0 - 1) $1 = (struct request *) 0xffff8801c7e3a440 crash> request 0xffff8801c7e3a440 struct request { queuelist = { next = 0xffff8801c7e3a440, prev = 0xffff8801c7e3a440 }, { csd = { llist = { next = 0xffff8801c7e3b690 }, func = 0xffffffff81336c00 <__blk_mq_complete_request_remote>, info = 0xffff8801c7e3a440, flags = 0 }, fifo_time = 18446612139962775184 }, q = 0xffff8801d636dbb8, mq_ctx = 0xffffe8ffffc04100, cpu = -1, cmd_type = 1, cmd_flags = 2305843009784119304, atomic_flags = 2, __data_len = 2048, __sector = 13992, bio = 0xffff8801c34565c0, biotail = 0xffff8801c34565c0, { hash = { next = 0x0, pprev = 0x0 }, ipi_list = { next = 0x0, prev = 0x0 } }, { rb_node = { __rb_parent_color = 18446612139962770632, rb_right = 0x0, rb_left = 0x0 }, completion_data = 0xffff8801c7e3a4c8 }, { elv = { icq = 0x0, priv = {0x0, 0x0} }, flush = { seq = 0, list = { next = 0x0, prev = 0x0 }, saved_end_io = 0x0 } }, rq_disk = 0xffff8801d63092a8, part = 0xffff8801d63092f0, start_time = 4361672618, rl = 0x0, start_time_ns = 67003651063676, io_start_time_ns = 0, nr_phys_segments = 1, nr_integrity_segments = 0, ioprio = 0, special = 0x0, tag = 121, errors = 7, __cmd = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", cmd = 0xffff8801c7e3a548 "", cmd_len = 0, extra_len = 0, sense_len = 0, resid_len = 2048, sense = 0x0, deadline = 4361722943, timeout_list = { next = 0xffff8801c7e3a580, prev = 0xffff8801c7e3a580 }, timeout = 30000, retries = 0, end_io = 0x0, end_io_data = 0x0, next_rq = 0x0 } ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-08-11 14:40 ` Steve Wise @ 2016-08-11 15:53 ` Steve Wise [not found] ` <00fe01d1f3e8$8992b330$9cb81990$@opengridcomputing.com> 1 sibling, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-08-11 15:53 UTC (permalink / raw) > > The request that caused the crash had rq->errors == NVME_SC_ABORT_REQ. I'm > > not > > sure that is always the case though. But this is very easy to reproduce, so I > > should be able to drill down and add any debug code you think might help. > > > > Here is the nvme_rdma_request and request that caused the crash this time: > > crash> nvme_rdma_request ffff8801c7e3a5b0 > struct nvme_rdma_request { > mr = 0xffff8801c7e1d298, > sqe = { > cqe = { > done = 0xffffffffa0853320 <nvme_rdma_send_done> > }, > data = 0xffff8801c7e1b2c8, > dma = 7648424648 > }, > sge = {{ > addr = 7648424648, > length = 64, > lkey = 0 > }, { > addr = 7814166528, > length = 2048, > lkey = 0 > }}, > num_sge = 2, > nents = 1, > inline_data = true, > need_inval = false, > reg_wr = { > wr = { > next = 0x0, > { > wr_id = 0, > wr_cqe = 0x0 > }, > sg_list = 0x0, > num_sge = 0, > opcode = IB_WR_RDMA_WRITE, > send_flags = 0, > ex = { > imm_data = 0, > invalidate_rkey = 0 > } > }, > mr = 0x0, > key = 0, > access = 0 > }, > reg_cqe = { > done = 0x0 > }, > queue = 0xffff8801d6484038, > sg_table = { > sgl = 0xffff8801c7e3a658, > nents = 1, > orig_nents = 1 > }, > first_sgl = 0xffff8801c7e3a658 > } > crash> nvme_rdma_queue 0xffff8801d6484038 > struct nvme_rdma_queue { > rsp_ring = 0xffff8801ce699158, > sig_count = 147 '\223', > queue_size = 128, > cmnd_capsule_len = 4160, > ctrl = 0xffff8801d6308958, > device = 0xffff8803836f44b8, > ib_cq = 0xffff8801d65366b8, > qp = 0xffff8801d6524518, > flags = 0, > cm_id = 0xffff8801d6525508, > cm_error = 0, > cm_done = { > done = 0, > wait = { > lock = { > { > rlock = { > raw_lock = { > val = { > counter = 0 > } > } > } > } > }, > task_list = { > next = 0xffff8801d6484098, > prev = 0xffff8801d6484098 > } > } > } > } > crash> gdb p ((struct request *)0xffff8801c7e3a5b0 - 1) > $1 = (struct request *) 0xffff8801c7e3a440 > crash> request 0xffff8801c7e3a440 > struct request { > queuelist = { > next = 0xffff8801c7e3a440, > prev = 0xffff8801c7e3a440 > }, > { > csd = { > llist = { > next = 0xffff8801c7e3b690 > }, > func = 0xffffffff81336c00 <__blk_mq_complete_request_remote>, > info = 0xffff8801c7e3a440, > flags = 0 > }, > fifo_time = 18446612139962775184 > }, > q = 0xffff8801d636dbb8, > mq_ctx = 0xffffe8ffffc04100, > cpu = -1, > cmd_type = 1, > cmd_flags = 2305843009784119304, > atomic_flags = 2, > __data_len = 2048, > __sector = 13992, > bio = 0xffff8801c34565c0, > biotail = 0xffff8801c34565c0, > { > hash = { > next = 0x0, > pprev = 0x0 > }, > ipi_list = { > next = 0x0, > prev = 0x0 > } > }, > { > rb_node = { > __rb_parent_color = 18446612139962770632, > rb_right = 0x0, > rb_left = 0x0 > }, > completion_data = 0xffff8801c7e3a4c8 > }, > { > elv = { > icq = 0x0, > priv = {0x0, 0x0} > }, > flush = { > seq = 0, > list = { > next = 0x0, > prev = 0x0 > }, > saved_end_io = 0x0 > } > }, > rq_disk = 0xffff8801d63092a8, > part = 0xffff8801d63092f0, > start_time = 4361672618, > rl = 0x0, > start_time_ns = 67003651063676, > io_start_time_ns = 0, > nr_phys_segments = 1, > nr_integrity_segments = 0, > ioprio = 0, > special = 0x0, > tag = 121, > errors = 7, > __cmd = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", > cmd = 0xffff8801c7e3a548 "", > cmd_len = 0, > extra_len = 0, > sense_len = 0, > resid_len = 2048, > sense = 0x0, > deadline = 4361722943, > timeout_list = { > next = 0xffff8801c7e3a580, > prev = 0xffff8801c7e3a580 > }, > timeout = 30000, > retries = 0, > end_io = 0x0, > end_io_data = 0x0, > next_rq = 0x0 > } > Ah, I see the nvme_rdma worker thread running nvme_rdma_reconnect_ctrl_work() on the same nvme_rdma_queue that is handling the request and crashing: crash> bt 371 PID: 371 TASK: ffff8803975a4300 CPU: 5 COMMAND: "kworker/5:2" [exception RIP: set_track+16] RIP: ffffffff81202070 RSP: ffff880397f2ba18 RFLAGS: 00000086 RAX: 0000000000000001 RBX: ffff88039f407a00 RCX: ffffffffa0853234 RDX: 0000000000000001 RSI: ffff8801d663e008 RDI: ffff88039f407a00 RBP: ffff880397f2ba48 R8: ffff8801d663e158 R9: 000000000000005a R10: 00000000000000cc R11: 0000000000000000 R12: ffff8801d663e008 R13: ffffea0007598f80 R14: 0000000000000001 R15: ffff8801d663e008 CS: 0010 SS: 0018 #0 [ffff880397f2ba50] free_debug_processing at ffffffff81204820 #1 [ffff880397f2bad0] __slab_free at ffffffff81204bfb #2 [ffff880397f2bb90] kfree at ffffffff81204dcd #3 [ffff880397f2bc00] nvme_rdma_free_qe at ffffffffa0853234 [nvme_rdma] #4 [ffff880397f2bc20] nvme_rdma_destroy_queue_ib at ffffffffa0853dbf [nvme_rdma] #5 [ffff880397f2bc60] nvme_rdma_stop_and_free_queue at ffffffffa085402d [nvme_rdma] #6 [ffff880397f2bc80] nvme_rdma_reconnect_ctrl_work at ffffffffa0854957 [nvme_rdma] #7 [ffff880397f2bcb0] process_one_work at ffffffff810a1593 #8 [ffff880397f2bd90] worker_thread at ffffffff810a222d #9 [ffff880397f2bec0] kthread at ffffffff810a6d6c #10 [ffff880397f2bf50] ret_from_fork at ffffffff816e2cbf So why is this request being processed during a reconnect? ^ permalink raw reply [flat|nested] 53+ messages in thread
[parent not found: <00fe01d1f3e8$8992b330$9cb81990$@opengridcomputing.com>]
* nvmf/rdma host crash during heavy load and keep alive recovery [not found] ` <00fe01d1f3e8$8992b330$9cb81990$@opengridcomputing.com> @ 2016-08-15 14:39 ` Steve Wise 2016-08-16 9:26 ` Sagi Grimberg 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-08-15 14:39 UTC (permalink / raw) > Ah, I see the nvme_rdma worker thread running > nvme_rdma_reconnect_ctrl_work() on the same nvme_rdma_queue that is > handling the request and crashing: > > crash> bt 371 > PID: 371 TASK: ffff8803975a4300 CPU: 5 COMMAND: "kworker/5:2" > [exception RIP: set_track+16] > RIP: ffffffff81202070 RSP: ffff880397f2ba18 RFLAGS: 00000086 > RAX: 0000000000000001 RBX: ffff88039f407a00 RCX: ffffffffa0853234 > RDX: 0000000000000001 RSI: ffff8801d663e008 RDI: ffff88039f407a00 > RBP: ffff880397f2ba48 R8: ffff8801d663e158 R9: 000000000000005a > R10: 00000000000000cc R11: 0000000000000000 R12: ffff8801d663e008 > R13: ffffea0007598f80 R14: 0000000000000001 R15: ffff8801d663e008 > CS: 0010 SS: 0018 > #0 [ffff880397f2ba50] free_debug_processing at ffffffff81204820 > #1 [ffff880397f2bad0] __slab_free at ffffffff81204bfb > #2 [ffff880397f2bb90] kfree at ffffffff81204dcd > #3 [ffff880397f2bc00] nvme_rdma_free_qe at ffffffffa0853234 [nvme_rdma] > #4 [ffff880397f2bc20] nvme_rdma_destroy_queue_ib at ffffffffa0853dbf > [nvme_rdma] > #5 [ffff880397f2bc60] nvme_rdma_stop_and_free_queue at ffffffffa085402d > [nvme_rdma] > #6 [ffff880397f2bc80] nvme_rdma_reconnect_ctrl_work at ffffffffa0854957 > [nvme_rdma] > #7 [ffff880397f2bcb0] process_one_work at ffffffff810a1593 > #8 [ffff880397f2bd90] worker_thread at ffffffff810a222d > #9 [ffff880397f2bec0] kthread at ffffffff810a6d6c > #10 [ffff880397f2bf50] ret_from_fork at ffffffff816e2cbf > > So why is this request being processed during a reconnect? Hey Sagi, Do you have any ideas on this crash? I could really use some help. Is it possible that recovery/reconnect/restart of a different controller is somehow restarting the requests for a controller still in recovery? Here is one issue perhaps: nvme_rdma_reconnect_ctrl_work() calls blk_mq_start_stopped_hw_queues() before calling nvme_rdma_init_io_queues(). Is that a problem? I tried moving blk_mq_start_stopped_hw_queues() to after the io queues are setup, but this causes a stall in nvme_rdma_reconnect_ctrl_work(). I think the blk queues need to be started to get the admin queue connected. Thoughts? Thanks, Steve. ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-08-15 14:39 ` Steve Wise @ 2016-08-16 9:26 ` Sagi Grimberg 2016-08-16 21:17 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Sagi Grimberg @ 2016-08-16 9:26 UTC (permalink / raw) On 15/08/16 17:39, Steve Wise wrote: > >> Ah, I see the nvme_rdma worker thread running >> nvme_rdma_reconnect_ctrl_work() on the same nvme_rdma_queue that is >> handling the request and crashing: >> >> crash> bt 371 >> PID: 371 TASK: ffff8803975a4300 CPU: 5 COMMAND: "kworker/5:2" >> [exception RIP: set_track+16] >> RIP: ffffffff81202070 RSP: ffff880397f2ba18 RFLAGS: 00000086 >> RAX: 0000000000000001 RBX: ffff88039f407a00 RCX: ffffffffa0853234 >> RDX: 0000000000000001 RSI: ffff8801d663e008 RDI: ffff88039f407a00 >> RBP: ffff880397f2ba48 R8: ffff8801d663e158 R9: 000000000000005a >> R10: 00000000000000cc R11: 0000000000000000 R12: ffff8801d663e008 >> R13: ffffea0007598f80 R14: 0000000000000001 R15: ffff8801d663e008 >> CS: 0010 SS: 0018 >> #0 [ffff880397f2ba50] free_debug_processing at ffffffff81204820 >> #1 [ffff880397f2bad0] __slab_free at ffffffff81204bfb >> #2 [ffff880397f2bb90] kfree at ffffffff81204dcd >> #3 [ffff880397f2bc00] nvme_rdma_free_qe at ffffffffa0853234 [nvme_rdma] >> #4 [ffff880397f2bc20] nvme_rdma_destroy_queue_ib at ffffffffa0853dbf >> [nvme_rdma] >> #5 [ffff880397f2bc60] nvme_rdma_stop_and_free_queue at ffffffffa085402d >> [nvme_rdma] >> #6 [ffff880397f2bc80] nvme_rdma_reconnect_ctrl_work at ffffffffa0854957 >> [nvme_rdma] >> #7 [ffff880397f2bcb0] process_one_work at ffffffff810a1593 >> #8 [ffff880397f2bd90] worker_thread at ffffffff810a222d >> #9 [ffff880397f2bec0] kthread at ffffffff810a6d6c >> #10 [ffff880397f2bf50] ret_from_fork at ffffffff816e2cbf >> >> So why is this request being processed during a reconnect? > > Hey Sagi, > > Do you have any ideas on this crash? I could really use some help. Not yet :( > Is it > possible that recovery/reconnect/restart of a different controller is somehow > restarting the requests for a controller still in recovery? I don't think this is the case. Can you try and find out if the request is from the admin tagset or from the io tagset? We rely on the fact that no I/O will be issued after we call nvme_stop_queues(). can you trace that we indeed call nvme_stop_queues when we start error recovery and do nvme_start_queues only when we successfully reconnect and not anywhere in between? If that is the case, I think we need to have a closer look at nvme_stop_queues... > Here is one issue > perhaps: nvme_rdma_reconnect_ctrl_work() calls blk_mq_start_stopped_hw_queues() > before calling nvme_rdma_init_io_queues(). Is that a problem? Its for the admin queue, without that you won't be able to issue the admin connect. > I tried moving > blk_mq_start_stopped_hw_queues() to after the io queues are setup, but this > causes a stall in nvme_rdma_reconnect_ctrl_work(). Make sense... ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-08-16 9:26 ` Sagi Grimberg @ 2016-08-16 21:17 ` Steve Wise 2016-08-17 18:57 ` Sagi Grimberg 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-08-16 21:17 UTC (permalink / raw) > > Hey Sagi, > > > > Do you have any ideas on this crash? I could really use some help. > > Not yet :( > > > Is it > > possible that recovery/reconnect/restart of a different controller is somehow > > restarting the requests for a controller still in recovery? > > I don't think this is the case. > Can you try and find out if the request is from the admin tagset or from > the io tagset? I wasn't sure how to tell easily which tagset the request was in, so instead I changed nvme_rdma_admin_mq_ops.queue_rq to its own distinct function, nvme_rdma_queue_admin_rq() which was a clone of nvme_rdma_queue_rq(). Thus when it crashed, I could tell by the stack trace. Anyway, the stack trace indicates it was from the io tagset. > > We rely on the fact that no I/O will be issued after we call > nvme_stop_queues(). can you trace that we indeed call nvme_stop_queues > when we start error recovery and do nvme_start_queues only when we > successfully reconnect and not anywhere in between? > I see from debugging that the nvme_ns->queue address of the queue being used that causes the crash was stopped via nvme_stop_queues(). It was never started via nvme_start_queues(). > If that is the case, I think we need to have a closer look at > nvme_stop_queues... > request_queue->queue_flags does have QUEUE_FLAG_STOPPED set: #define QUEUE_FLAG_STOPPED 2 /* queue is stopped */ crash> request_queue.queue_flags -x 0xffff880397a13d28 queue_flags = 0x1f07a04 crash> request_queue.mq_ops 0xffff880397a13d28 mq_ops = 0xffffffffa084b140 <nvme_rdma_mq_ops> So it appears the queue is stopped, yet a request is being processed for that queue. Perhaps there is a race where QUEUE_FLAG_STOPPED is set after a request is scheduled? ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-08-16 21:17 ` Steve Wise @ 2016-08-17 18:57 ` Sagi Grimberg 2016-08-17 19:07 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Sagi Grimberg @ 2016-08-17 18:57 UTC (permalink / raw) >> If that is the case, I think we need to have a closer look at >> nvme_stop_queues... >> > > request_queue->queue_flags does have QUEUE_FLAG_STOPPED set: > > #define QUEUE_FLAG_STOPPED 2 /* queue is stopped */ > > crash> request_queue.queue_flags -x 0xffff880397a13d28 > queue_flags = 0x1f07a04 > crash> request_queue.mq_ops 0xffff880397a13d28 > mq_ops = 0xffffffffa084b140 <nvme_rdma_mq_ops> > > So it appears the queue is stopped, yet a request is being processed for that > queue. Perhaps there is a race where QUEUE_FLAG_STOPPED is set after a request > is scheduled? Umm. When the keep-alive timeout triggers we stop the queues. only 10 seconds (or reconnect_delay) later we free the queues and reestablish them, so I find it hard to believe that a request was queued, and spent so long in queue_rq until we freed the queue-pair. From you description of the sequence it seems that after 10 seconds we attempt a reconnect and during that time an IO request crashes the party. I assume this means you ran traffic during the sequence yes? ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-08-17 18:57 ` Sagi Grimberg @ 2016-08-17 19:07 ` Steve Wise 2016-09-01 19:14 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-08-17 19:07 UTC (permalink / raw) > >> If that is the case, I think we need to have a closer look at > >> nvme_stop_queues... > >> > > > > request_queue->queue_flags does have QUEUE_FLAG_STOPPED set: > > > > #define QUEUE_FLAG_STOPPED 2 /* queue is stopped */ > > > > crash> request_queue.queue_flags -x 0xffff880397a13d28 > > queue_flags = 0x1f07a04 > > crash> request_queue.mq_ops 0xffff880397a13d28 > > mq_ops = 0xffffffffa084b140 <nvme_rdma_mq_ops> > > > > So it appears the queue is stopped, yet a request is being processed for > that > > queue. Perhaps there is a race where QUEUE_FLAG_STOPPED is set after a > request > > is scheduled? > > Umm. When the keep-alive timeout triggers we stop the queues. only 10 > seconds (or reconnect_delay) later we free the queues and reestablish > them, so I find it hard to believe that a request was queued, and spent > so long in queue_rq until we freed the queue-pair. I agree. > > From you description of the sequence it seems that after 10 seconds we > attempt a reconnect and during that time an IO request crashes the > party. > Yes. > I assume this means you ran traffic during the sequence yes? Mega fio test streaming to all 10 devices. I start the following script, and then bring the link down a few seconds later, which triggers the kato, then 10 seconds later reconnecting starts and whamo... for i in $(seq 1 20) ; do fio --ioengine=libaio --rw=randwrite --name=randwrite --size=200m --direct=1 \ --invalidate=1 --fsync_on_close=1 --group_reporting --exitall --runtime=20 \ --time_based --filename=/dev/nvme0n1 --filename=/dev/nvme1n1 \ --filename=/dev/nvme2n1 --filename=/dev/nvme3n1 --filename=/dev/nvme4n1 \ --filename=/dev/nvme5n1 --filename=/dev/nvme6n1 --filename=/dev/nvme7n1 \ --filename=/dev/nvme8n1 --filename=/dev/nvme9n1 --iodepth=4 --numjobs=32 \ --bs=2K |grep -i "aggrb\|iops" sleep 3 echo "### Iteration $i Done ###" done ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-08-17 19:07 ` Steve Wise @ 2016-09-01 19:14 ` Steve Wise 2016-09-04 9:17 ` Sagi Grimberg 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-09-01 19:14 UTC (permalink / raw) > > > > > > So it appears the queue is stopped, yet a request is being processed for > > that > > > queue. Perhaps there is a race where QUEUE_FLAG_STOPPED is set after a > > request > > > is scheduled? > > > > Umm. When the keep-alive timeout triggers we stop the queues. only 10 > > seconds (or reconnect_delay) later we free the queues and reestablish > > them, so I find it hard to believe that a request was queued, and spent > > so long in queue_rq until we freed the queue-pair. > > I agree. > > > > > From you description of the sequence it seems that after 10 seconds we > > attempt a reconnect and during that time an IO request crashes the > > party. > > > > Yes. > > > I assume this means you ran traffic during the sequence yes? > > Mega fio test streaming to all 10 devices. I start the following script, > and then bring the link down a few seconds later, which triggers the kato, > then 10 seconds later reconnecting starts and whamo... > > > for i in $(seq 1 20) ; do > > fio --ioengine=libaio --rw=randwrite --name=randwrite --size=200m > --direct=1 \ > --invalidate=1 --fsync_on_close=1 --group_reporting --exitall > --runtime=20 \ > --time_based --filename=/dev/nvme0n1 --filename=/dev/nvme1n1 \ > --filename=/dev/nvme2n1 --filename=/dev/nvme3n1 > --filename=/dev/nvme4n1 \ > --filename=/dev/nvme5n1 --filename=/dev/nvme6n1 > --filename=/dev/nvme7n1 \ > --filename=/dev/nvme8n1 --filename=/dev/nvme9n1 --iodepth=4 > --numjobs=32 \ > --bs=2K |grep -i "aggrb\|iops" > sleep 3 > echo "### Iteration $i Done ###" > done > Ok, back to this issue. :) The same crash happens with mlx4_ib, so this isn't related to cxgb4. To sum up: With pending NVME IO on the nvme-rdma host, and in the presence of kato recovery/reconnect due to the target going away, some NVME requests get restarted that are referencing nvmf controllers that have freed queues. I see this also with my recent v4 series that corrects the recovery problems with nvme-rdma when the target is down, but without pending IO. So the crash in this email is yet another issue that we see when the nvme host has lots of pending IO requests during kato recovery/reconnect... My findings to date: the IO is not an admin queue IO. It is not the kato messages. The io queue has been stopped, yet the request is attempted and causes the crash. Any help is appreciated... Steve. ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-01 19:14 ` Steve Wise @ 2016-09-04 9:17 ` Sagi Grimberg 2016-09-07 21:08 ` Steve Wise 2016-09-07 21:33 ` Steve Wise 0 siblings, 2 replies; 53+ messages in thread From: Sagi Grimberg @ 2016-09-04 9:17 UTC (permalink / raw) Hey Steve, > Ok, back to this issue. :) > > The same crash happens with mlx4_ib, so this isn't related to cxgb4. To sum up: > > With pending NVME IO on the nvme-rdma host, and in the presence of kato > recovery/reconnect due to the target going away, some NVME requests get > restarted that are referencing nvmf controllers that have freed queues. I see > this also with my recent v4 series that corrects the recovery problems with > nvme-rdma when the target is down, but without pending IO. > > So the crash in this email is yet another issue that we see when the nvme host > has lots of pending IO requests during kato recovery/reconnect... > > My findings to date: the IO is not an admin queue IO. It is not the kato > messages. The io queue has been stopped, yet the request is attempted and > causes the crash. > > Any help is appreciated... So in the current state, my impression is that we are seeing a request queued when we shouldn't (or at least assume we won't). Given that you run heavy load to reproduce this, I can only suspect that this is a race condition. Does this happen if you change the reconnect delay to be something different than 10 seconds? (say 30?) Can you also give patch [1] a try? It's not a solution, but I want to see if it hides the problem... Now, given that you already verified that the queues are stopped with BLK_MQ_S_STOPPED, I'm looking at blk-mq now. I see that blk_mq_run_hw_queue() and __blk_mq_run_hw_queue() indeed take BLK_MQ_S_STOPPED into account. Theoretically if we free the queue pairs after we passed these checks while the rq_list is being processed then we can end-up with this condition, but given that it takes essentially forever (10 seconds) I tend to doubt this is the case. HCH, Jens, Keith, any useful pointers for us? To summarize we see a stray request being queued long after we set BLK_MQ_S_STOPPED (and by long I mean 10 seconds). [1]: -- diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index d2f891efb27b..38ea5dab4524 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -701,20 +701,13 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work) bool changed; int ret; - if (ctrl->queue_count > 1) { - nvme_rdma_free_io_queues(ctrl); - - ret = blk_mq_reinit_tagset(&ctrl->tag_set); - if (ret) - goto requeue; - } - - nvme_rdma_stop_and_free_queue(&ctrl->queues[0]); ret = blk_mq_reinit_tagset(&ctrl->admin_tag_set); if (ret) goto requeue; + nvme_rdma_stop_and_free_queue(&ctrl->queues[0]); + ret = nvme_rdma_init_queue(ctrl, 0, NVMF_AQ_DEPTH); if (ret) goto requeue; @@ -732,6 +725,12 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work) nvme_start_keep_alive(&ctrl->ctrl); if (ctrl->queue_count > 1) { + ret = blk_mq_reinit_tagset(&ctrl->tag_set); + if (ret) + goto stop_admin_q; + + nvme_rdma_free_io_queues(ctrl); + ret = nvme_rdma_init_io_queues(ctrl); if (ret) goto stop_admin_q; -- ^ permalink raw reply related [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-04 9:17 ` Sagi Grimberg @ 2016-09-07 21:08 ` Steve Wise 2016-09-08 7:45 ` Sagi Grimberg [not found] ` <7f09e373-6316-26a3-ae81-dab1205d88ab@grimbe rg.me> 2016-09-07 21:33 ` Steve Wise 1 sibling, 2 replies; 53+ messages in thread From: Steve Wise @ 2016-09-07 21:08 UTC (permalink / raw) > Hey Steve, > > > Ok, back to this issue. :) > > > > The same crash happens with mlx4_ib, so this isn't related to cxgb4. To sum up: > > > > With pending NVME IO on the nvme-rdma host, and in the presence of kato > > recovery/reconnect due to the target going away, some NVME requests get > > restarted that are referencing nvmf controllers that have freed queues. I see > > this also with my recent v4 series that corrects the recovery problems with > > nvme-rdma when the target is down, but without pending IO. > > > > So the crash in this email is yet another issue that we see when the nvme host > > has lots of pending IO requests during kato recovery/reconnect... > > > > My findings to date: the IO is not an admin queue IO. It is not the kato > > messages. The io queue has been stopped, yet the request is attempted and > > causes the crash. > > > > Any help is appreciated... > > So in the current state, my impression is that we are seeing a request > queued when we shouldn't (or at least assume we won't). > > Given that you run heavy load to reproduce this, I can only suspect that > this is a race condition. > > Does this happen if you change the reconnect delay to be something > different than 10 seconds? (say 30?) > Yes. But I noticed something when performing this experiment that is an important point, I think: if I just bring the network interface down and leave it down, we don't crash. During this state, I see the host continually reconnecting after the reconnect delay time, timing out trying to reconnect, and retrying after another reconnect_delay period. I see this for all 10 targets of course. The crash only happens when I bring the interface back up, and the targets begin to reconnect. So the process of successfully reconnecting the RDMA QPs, and restarting the nvme queues is somehow triggering running an nvme request too soon (or perhaps on the wrong queue). > Can you also give patch [1] a try? It's not a solution, but I want > to see if it hides the problem... > hmm. I ran the experiment once with [1] and it didn't crash. I ran it a 2nd time and hit a new crash. Maybe a problem with [1]? [ 379.864950] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024 [ 379.874330] IP: [<ffffffffa0307ad1>] ib_destroy_qp+0x21/0x1a0 [ib_core] [ 379.882489] PGD 1002571067 PUD fa121f067 PMD 0 [ 379.888561] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 379.894526] Modules linked in: nvme_rdma nvme_fabrics brd iw_cxgb4 cxgb4 ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge 8021q mrp garp stp llc cachefiles fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad ocrdma be2net iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx5_ib mlx5_core mlx4_ib mlx4_en mlx4_core ib_mthca ib_core binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm irqbypass uinput iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr dm_mod i2c_i801 i2c_smbus sg lpc_ich mfd_core mei_me mei nvme nvme_core igb dca ptp pps_core ipmi_si ipmi_msghandler wmi ext4(E) mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) libata(E) mgag200(E) ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E) i2c_core(E) [last unloaded: cxgb4] [ 380.005238] CPU: 30 PID: 10981 Comm: kworker/30:2 Tainted: G E 4.8.0-rc4-block-for-linus-dbg+ #32 [ 380.017596] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015 [ 380.027002] Workqueue: nvme_rdma_wq nvme_rdma_reconnect_ctrl_work [nvme_rdma] [ 380.036664] task: ffff880fd014af40 task.stack: ffff881013b38000 [ 380.045082] RIP: 0010:[<ffffffffa0307ad1>] [<ffffffffa0307ad1>] ib_destroy_qp+0x21/0x1a0 [ib_core] [ 380.056562] RSP: 0018:ffff881013b3bb88 EFLAGS: 00010286 [ 380.064291] RAX: ffff880fd014af40 RBX: ffff880fba6d3a78 RCX: 0000000000000000 [ 380.073733] RDX: ffff881037593d20 RSI: ffff88103758dfc8 RDI: 0000000000000000 [ 380.083030] RBP: ffff881013b3bbe8 R08: 0000000000000000 R09: 0000000000000000 [ 380.092230] R10: ffffffffa0030060 R11: 0000000000000000 R12: ffff880fba6d3c98 [ 380.101442] R13: 00000000ffffff98 R14: ffff88101ce78008 R15: ffff88100507c968 [ 380.110541] FS: 0000000000000000(0000) GS:ffff881037580000(0000) knlGS:0000000000000000 [ 380.120507] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 380.128048] CR2: 0000000000000024 CR3: 0000000fa12f6000 CR4: 00000000000406e0 [ 380.136990] Stack: [ 380.140714] ffff881013b3bb98 0000000000000246 000003e800000286 ffffffffa0733fe0 [ 380.149939] ffff881013b3bbd8 ffffffffa0274588 ffff881013b3bbe8 ffff880fba6d3a78 [ 380.159113] ffff880fba6d3c98 00000000ffffff98 ffff88101ce78008 ffff88100507c968 [ 380.168205] Call Trace: [ 380.172280] [<ffffffffa0733fe0>] ? cma_work_handler+0xa0/0xa0 [rdma_cm] [ 380.180546] [<ffffffffa0731611>] rdma_destroy_qp+0x31/0x50 [rdma_cm] [ 380.188483] [<ffffffffa0661e92>] nvme_rdma_destroy_queue_ib+0x52/0xb0 [nvme_rdma] [ 380.197566] [<ffffffffa0662678>] nvme_rdma_init_queue+0x128/0x180 [nvme_rdma] [ 380.206264] [<ffffffffa0662a09>] nvme_rdma_reconnect_ctrl_work+0x79/0x220 [nvme_rdma] [ 380.215606] [<ffffffff810a15e3>] process_one_work+0x183/0x4d0 [ 380.222820] [<ffffffff816de9f0>] ? __schedule+0x1f0/0x5b0 [ 380.229680] [<ffffffff816deeb0>] ? schedule+0x40/0xb0 [ 380.236147] [<ffffffff810a227d>] worker_thread+0x16d/0x530 [ 380.243051] [<ffffffff810b43d5>] ? trace_event_raw_event_sched_switch+0xe5/0x130 [ 380.251821] [<ffffffff816de9f0>] ? __schedule+0x1f0/0x5b0 [ 380.258600] [<ffffffff810cba86>] ? __wake_up_common+0x56/0x90 [ 380.265676] [<ffffffff810a2110>] ? maybe_create_worker+0x120/0x120 [ 380.273153] [<ffffffff816deeb0>] ? schedule+0x40/0xb0 [ 380.279504] [<ffffffff810a2110>] ? maybe_create_worker+0x120/0x120 [ 380.286993] [<ffffffff810a6dbc>] kthread+0xcc/0xf0 [ 380.293078] [<ffffffff810b177e>] ? schedule_tail+0x1e/0xc0 [ 380.299857] [<ffffffff816e2b7f>] ret_from_fork+0x1f/0x40 ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-07 21:08 ` Steve Wise @ 2016-09-08 7:45 ` Sagi Grimberg 2016-09-08 20:47 ` Steve Wise 2016-09-08 21:00 ` Steve Wise [not found] ` <7f09e373-6316-26a3-ae81-dab1205d88ab@grimbe rg.me> 1 sibling, 2 replies; 53+ messages in thread From: Sagi Grimberg @ 2016-09-08 7:45 UTC (permalink / raw) >> Does this happen if you change the reconnect delay to be something >> different than 10 seconds? (say 30?) >> > > Yes. But I noticed something when performing this experiment that is an > important point, I think: if I just bring the network interface down and leave > it down, we don't crash. During this state, I see the host continually > reconnecting after the reconnect delay time, timing out trying to reconnect, and > retrying after another reconnect_delay period. I see this for all 10 targets of > course. The crash only happens when I bring the interface back up, and the > targets begin to reconnect. So the process of successfully reconnecting the > RDMA QPs, and restarting the nvme queues is somehow triggering running an nvme > request too soon (or perhaps on the wrong queue). Interesting. Given this is easy to reproduce, can you record the: (request_tag, *queue, *qp) for each request submitted? I'd like to see that the *queue stays the same for each tag but the *qp indeed changes. >> Can you also give patch [1] a try? It's not a solution, but I want >> to see if it hides the problem... >> > > hmm. I ran the experiment once with [1] and it didn't crash. I ran it a 2nd > time and hit a new crash. Maybe a problem with [1]? Strange, I don't see how we can visit rdma_destroy_qp twice given that we have NVME_RDMA_IB_QUEUE_ALLOCATED bit protecting it. Not sure if it fixes anything, but we probably need it regardless, can you give another go with this on top: -- diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 43602cebf097..89023326f397 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -542,11 +542,12 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue, goto out_destroy_qp; } set_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags); + clear_bit(NVME_RDMA_Q_DELETING, &queue->flags); return 0; out_destroy_qp: - ib_destroy_qp(queue->qp); + rdma_destroy_qp(queue->qp); out_destroy_ib_cq: ib_free_cq(queue->ib_cq); out: @@ -591,15 +592,16 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl, if (ret) { dev_info(ctrl->ctrl.device, "rdma_resolve_addr wait failed (%d).\n", ret); - goto out_destroy_cm_id; + goto out_destroy_queue_id; } set_bit(NVME_RDMA_Q_CONNECTED, &queue->flags); return 0; -out_destroy_cm_id: +out_destroy_queue_ib: nvme_rdma_destroy_queue_ib(queue); +out_destroy_cm_id: rdma_destroy_id(queue->cm_id); return ret; } -- ^ permalink raw reply related [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-08 7:45 ` Sagi Grimberg @ 2016-09-08 20:47 ` Steve Wise 2016-09-08 21:00 ` Steve Wise 1 sibling, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-09-08 20:47 UTC (permalink / raw) > >> Does this happen if you change the reconnect delay to be something > >> different than 10 seconds? (say 30?) > >> > > > > Yes. But I noticed something when performing this experiment that is an > > important point, I think: if I just bring the network interface down and leave > > it down, we don't crash. During this state, I see the host continually > > reconnecting after the reconnect delay time, timing out trying to reconnect, and > > retrying after another reconnect_delay period. I see this for all 10 targets of > > course. The crash only happens when I bring the interface back up, and the > > targets begin to reconnect. So the process of successfully reconnecting the > > RDMA QPs, and restarting the nvme queues is somehow triggering running an > nvme > > request too soon (or perhaps on the wrong queue). > > Interesting. Given this is easy to reproduce, can you record the: > (request_tag, *queue, *qp) for each request submitted? > > I'd like to see that the *queue stays the same for each tag > but the *qp indeed changes. > I tried this, and didn't hit the BUG_ON(), yet still hit the crash. I believe this verifies that *queue never changed... diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index c075ea5..a77729e 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -76,6 +76,7 @@ struct nvme_rdma_request { struct ib_reg_wr reg_wr; struct ib_cqe reg_cqe; struct nvme_rdma_queue *queue; + struct nvme_rdma_queue *save_queue; struct sg_table sg_table; struct scatterlist first_sgl[]; }; @@ -354,6 +355,8 @@ static int __nvme_rdma_init_request(struct nvme_rdma_ctrl *ctrl, } req->queue = queue; + if (!req->save_queue) + req->save_queue = queue; return 0; @@ -1434,6 +1436,9 @@ static int nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx, WARN_ON_ONCE(rq->tag < 0); + BUG_ON(queue != req->queue); + BUG_ON(queue != req->save_queue); + dev = queue->device->dev; ib_dma_sync_single_for_cpu(dev, sqe->dma, sizeof(struct nvme_command), DMA_TO_DEVICE); ^ permalink raw reply related [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-08 7:45 ` Sagi Grimberg 2016-09-08 20:47 ` Steve Wise @ 2016-09-08 21:00 ` Steve Wise 1 sibling, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-09-08 21:00 UTC (permalink / raw) > > >> Can you also give patch [1] a try? It's not a solution, but I want > >> to see if it hides the problem... > >> > > > > hmm. I ran the experiment once with [1] and it didn't crash. I ran it a 2nd > > time and hit a new crash. Maybe a problem with [1]? > > Strange, I don't see how we can visit rdma_destroy_qp twice given > that we have NVME_RDMA_IB_QUEUE_ALLOCATED bit protecting it. > > Not sure if it fixes anything, but we probably need it regardless, can > you give another go with this on top: Still hit it with this on top (had to tweak the patch a little). Steve. ^ permalink raw reply [flat|nested] 53+ messages in thread
[parent not found: <7f09e373-6316-26a3-ae81-dab1205d88ab@grimbe rg.me>]
[parent not found: <021201d20a14$0 f203b80$2d60b280$@opengridcomputing.com>]
[parent not found: <021201d20a14$0f203b80$2d60b280$@opengridcomputing.com>]
* nvmf/rdma host crash during heavy load and keep alive recovery [not found] ` <021201d20a14$0f203b80$2d60b280$@opengridcomputing.com> @ 2016-09-08 21:21 ` Steve Wise 0 siblings, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-09-08 21:21 UTC (permalink / raw) > > > > >> Can you also give patch [1] a try? It's not a solution, but I want > > >> to see if it hides the problem... > > >> > > > > > > hmm. I ran the experiment once with [1] and it didn't crash. I ran > it a 2nd > > > time and hit a new crash. Maybe a problem with [1]? > > > > Strange, I don't see how we can visit rdma_destroy_qp twice given > > that we have NVME_RDMA_IB_QUEUE_ALLOCATED bit protecting it. > > > > Not sure if it fixes anything, but we probably need it regardless, can > > you give another go with this on top: > > Still hit it with this on top (had to tweak the patch a little). > > Steve. So with this patch, the crash is a little different. One thread is in the usual place crashed in nvme_rdma_post_send() called by nvme_rdma_queue_rq() because the qp and cm_id in the nvme_rdma_queue have been freed. Actually there are a handful of CPUs processing different requests in the same type stack trace. But perhaps that is expected given the work load and number of controllers (10) and queues (32 per controller)... I also see another worker thread here: PID: 3769 TASK: ffff880e18972f40 CPU: 3 COMMAND: "kworker/3:3" #0 [ffff880e2f7938d0] __schedule at ffffffff816dfa17 #1 [ffff880e2f793930] schedule at ffffffff816dff00 #2 [ffff880e2f793980] schedule_timeout at ffffffff816e2b1b #3 [ffff880e2f793a60] wait_for_completion_timeout at ffffffff816e0f03 #4 [ffff880e2f793ad0] destroy_cq at ffffffffa061d8f3 [iw_cxgb4] #5 [ffff880e2f793b60] c4iw_destroy_cq at ffffffffa061dad5 [iw_cxgb4] #6 [ffff880e2f793bf0] ib_free_cq at ffffffffa0360e5a [ib_core] #7 [ffff880e2f793c20] nvme_rdma_destroy_queue_ib at ffffffffa0644e9b [nvme_rdma] #8 [ffff880e2f793c60] nvme_rdma_stop_and_free_queue at ffffffffa0645083 [nvme_rdma] #9 [ffff880e2f793c80] nvme_rdma_reconnect_ctrl_work at ffffffffa0645a9f [nvme_rdma] #10 [ffff880e2f793cb0] process_one_work at ffffffff810a1613 #11 [ffff880e2f793d90] worker_thread at ffffffff810a22ad #12 [ffff880e2f793ec0] kthread at ffffffff810a6dec #13 [ffff880e2f793f50] ret_from_fork at ffffffff816e3bbf I'm trying to identify if this reconnect is for the same controller that crashed processing a request. It probably is, but I need to search the stack frame to try and find the controller pointer... Steve. ^ permalink raw reply [flat|nested] 53+ messages in thread
[parent not found: <021401d20a16$ed60d470$c8227d50$@opengridcomputing.com>]
[parent not found: <021501d20a19$327ba5b0$9772f110$@opengrid computing.com>]
* nvmf/rdma host crash during heavy load and keep alive recovery [not found] ` <021401d20a16$ed60d470$c8227d50$@opengridcomputing.com> [not found] ` <021501d20a19$327ba5b0$9772f110$@opengrid computing.com> @ 2016-09-08 21:37 ` Steve Wise 2016-09-09 15:50 ` Steve Wise 1 sibling, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-09-08 21:37 UTC (permalink / raw) > > > Not sure if it fixes anything, but we probably need it regardless, can > > > you give another go with this on top: > > > > Still hit it with this on top (had to tweak the patch a little). > > > > Steve. > > So with this patch, the crash is a little different. One thread is in the > usual place crashed in nvme_rdma_post_send() called by > nvme_rdma_queue_rq() because the qp and cm_id in the nvme_rdma_queue have > been freed. Actually there are a handful of CPUs processing different > requests in the same type stack trace. But perhaps that is expected given > the work load and number of controllers (10) and queues (32 per > controller)... > > I also see another worker thread here: > > PID: 3769 TASK: ffff880e18972f40 CPU: 3 COMMAND: "kworker/3:3" > #0 [ffff880e2f7938d0] __schedule at ffffffff816dfa17 > #1 [ffff880e2f793930] schedule at ffffffff816dff00 > #2 [ffff880e2f793980] schedule_timeout at ffffffff816e2b1b > #3 [ffff880e2f793a60] wait_for_completion_timeout at ffffffff816e0f03 > #4 [ffff880e2f793ad0] destroy_cq at ffffffffa061d8f3 [iw_cxgb4] > #5 [ffff880e2f793b60] c4iw_destroy_cq at ffffffffa061dad5 [iw_cxgb4] > #6 [ffff880e2f793bf0] ib_free_cq at ffffffffa0360e5a [ib_core] > #7 [ffff880e2f793c20] nvme_rdma_destroy_queue_ib at ffffffffa0644e9b > [nvme_rdma] > #8 [ffff880e2f793c60] nvme_rdma_stop_and_free_queue at ffffffffa0645083 > [nvme_rdma] > #9 [ffff880e2f793c80] nvme_rdma_reconnect_ctrl_work at ffffffffa0645a9f > [nvme_rdma] > #10 [ffff880e2f793cb0] process_one_work at ffffffff810a1613 > #11 [ffff880e2f793d90] worker_thread at ffffffff810a22ad > #12 [ffff880e2f793ec0] kthread at ffffffff810a6dec > #13 [ffff880e2f793f50] ret_from_fork at ffffffff816e3bbf > > I'm trying to identify if this reconnect is for the same controller that > crashed processing a request. It probably is, but I need to search the > stack frame to try and find the controller pointer... > > Steve. Both the thread that crashed and the thread doing reconnect are operating on ctrl "nvme2"... ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-08 21:37 ` Steve Wise @ 2016-09-09 15:50 ` Steve Wise 2016-09-12 20:10 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-09-09 15:50 UTC (permalink / raw) > > So with this patch, the crash is a little different. One thread is in the > > usual place crashed in nvme_rdma_post_send() called by > > nvme_rdma_queue_rq() because the qp and cm_id in the nvme_rdma_queue > have > > been freed. Actually there are a handful of CPUs processing different > > requests in the same type stack trace. But perhaps that is expected given > > the work load and number of controllers (10) and queues (32 per > > controller)... > > > > I also see another worker thread here: > > > > PID: 3769 TASK: ffff880e18972f40 CPU: 3 COMMAND: "kworker/3:3" > > #0 [ffff880e2f7938d0] __schedule at ffffffff816dfa17 > > #1 [ffff880e2f793930] schedule at ffffffff816dff00 > > #2 [ffff880e2f793980] schedule_timeout at ffffffff816e2b1b > > #3 [ffff880e2f793a60] wait_for_completion_timeout at ffffffff816e0f03 > > #4 [ffff880e2f793ad0] destroy_cq at ffffffffa061d8f3 [iw_cxgb4] > > #5 [ffff880e2f793b60] c4iw_destroy_cq at ffffffffa061dad5 [iw_cxgb4] > > #6 [ffff880e2f793bf0] ib_free_cq at ffffffffa0360e5a [ib_core] > > #7 [ffff880e2f793c20] nvme_rdma_destroy_queue_ib at ffffffffa0644e9b > > [nvme_rdma] > > #8 [ffff880e2f793c60] nvme_rdma_stop_and_free_queue at ffffffffa0645083 > > [nvme_rdma] > > #9 [ffff880e2f793c80] nvme_rdma_reconnect_ctrl_work at ffffffffa0645a9f > > [nvme_rdma] > > #10 [ffff880e2f793cb0] process_one_work at ffffffff810a1613 > > #11 [ffff880e2f793d90] worker_thread at ffffffff810a22ad > > #12 [ffff880e2f793ec0] kthread at ffffffff810a6dec > > #13 [ffff880e2f793f50] ret_from_fork at ffffffff816e3bbf > > > > I'm trying to identify if this reconnect is for the same controller that > > crashed processing a request. It probably is, but I need to search the > > stack frame to try and find the controller pointer... > > > > Steve. > > Both the thread that crashed and the thread doing reconnect are operating on > ctrl "nvme2"... I'm reanalyzing the crash dump for this particular crash, and I've found the blk_mq_hw_ctx struct that has ->driver_data == to the nvme_rdma_queue that caused the crash. hctx->state, though, is 2, which is the BLK_MQ_S_TAG_ACTIVE bit. IE the BLK_MQ_S_STOPPED bit is _not_ set! Attached are the blk_mq_hw_ctx, nvme_rdma_queue, and nvme_rdma_ctrl structs, as well as the nvme_rdma_requeust, request and request_queue structs if you want to have a look... Steve. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: crash_analysis.txt URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20160909/6979423a/attachment-0001.txt> ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-09 15:50 ` Steve Wise @ 2016-09-12 20:10 ` Steve Wise [not found] ` <da2e918b-0f18-e032-272d-368c6ec49c62@gri mberg.me> ` (2 more replies) 0 siblings, 3 replies; 53+ messages in thread From: Steve Wise @ 2016-09-12 20:10 UTC (permalink / raw) > I'm reanalyzing the crash dump for this particular crash, and I've found the > blk_mq_hw_ctx struct that has ->driver_data == to the nvme_rdma_queue that > caused the crash. hctx->state, though, is 2, which is the BLK_MQ_S_TAG_ACTIVE > bit. IE the BLK_MQ_S_STOPPED bit is _not_ set! > > Attached are the blk_mq_hw_ctx, nvme_rdma_queue, and nvme_rdma_ctrl structs, > as > well as the nvme_rdma_requeust, request and request_queue structs if you want to > have a look... > > Steve. > Hey Sagi, I've added this debug logic to try and assert that an nvme_rdma_queue is associated with one and only one blk_mq_hw_ctx: --- diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 64434a1..ab7edf7 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -100,6 +100,7 @@ struct nvme_rdma_queue { struct rdma_cm_id *cm_id; int cm_error; struct completion cm_done; + struct blk_mq_hw_ctx *hctx; }; struct nvme_rdma_ctrl { @@ -384,6 +385,7 @@ static int nvme_rdma_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, BUG_ON(hctx_idx >= ctrl->queue_count); hctx->driver_data = queue; + queue->hctx = hctx; return 0; } @@ -396,6 +398,7 @@ static int nvme_rdma_init_admin_hctx(struct blk_mq_hw_ctx *hctx, void *data, BUG_ON(hctx_idx != 0); hctx->driver_data = queue; + queue->hctx = hctx; return 0; } @@ -622,6 +625,7 @@ static void nvme_rdma_stop_and_free_queue(struct nvme_rdma_queue *queue) { if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) return; + BUG_ON(!test_bit(BLK_MQ_S_STOPPED, &queue->hctx->state)); nvme_rdma_stop_queue(queue); nvme_rdma_free_queue(queue); } @@ -1408,6 +1412,8 @@ static int nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx, WARN_ON_ONCE(rq->tag < 0); + BUG_ON(hctx != queue->hctx); + BUG_ON(test_bit(BLK_MQ_S_STOPPED, &hctx->state)); dev = queue->device->dev; ib_dma_sync_single_for_cpu(dev, sqe->dma, sizeof(struct nvme_command), DMA_TO_DEVICE); --- When I reran the test forcing reconnects, I hit the BUG_ON(hctx != queue->hctx) in nvme_rdma_queue_rq() when doing the first reconnect (not when initially connecting the targets). Here is the back trace. Is my debug logic flawed? Or does this mean something is screwed up once we start reconnecting. crash> bt PID: 1819 TASK: ffff88101d0217c0 CPU: 0 COMMAND: "kworker/0:2" #0 [ffff8810090d34b0] machine_kexec at ffffffff8105fbd0 #1 [ffff8810090d3520] __crash_kexec at ffffffff81116998 #2 [ffff8810090d35f0] crash_kexec at ffffffff81116a6d #3 [ffff8810090d3620] oops_end at ffffffff81032bd6 #4 [ffff8810090d3650] die at ffffffff810330cb #5 [ffff8810090d3680] do_trap at ffffffff8102fff1 #6 [ffff8810090d36e0] do_error_trap at ffffffff8103032d #7 [ffff8810090d37a0] do_invalid_op at ffffffff81030480 #8 [ffff8810090d37b0] invalid_op at ffffffff816e47be [exception RIP: nvme_rdma_queue_rq+621] RIP: ffffffffa065ce3d RSP: ffff8810090d3868 RFLAGS: 00010206 RAX: 0000000000000000 RBX: ffff880e33640000 RCX: dead000000000200 RDX: ffff8810090d3928 RSI: ffff8810090d38f8 RDI: ffff880e315cb528 RBP: ffff8810090d38a8 R8: ffff880e33640000 R9: 0000000000000000 R10: 0000000000000674 R11: ffff8810090d3a18 R12: ffff880e36ab91d0 R13: ffff880e33640170 R14: ffff880e315cb528 R15: ffff880e36bc1138 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff8810090d38b0] __blk_mq_run_hw_queue at ffffffff81338b1b #10 [ffff8810090d3a00] blk_mq_run_hw_queue at ffffffff81338ffe #11 [ffff8810090d3a20] blk_mq_insert_request at ffffffff8133a130 #12 [ffff8810090d3a90] blk_execute_rq_nowait at ffffffff813342dd #13 [ffff8810090d3ad0] blk_execute_rq at ffffffff8133442e #14 [ffff8810090d3b80] __nvme_submit_sync_cmd at ffffffffa02715d5 [nvme_core] #15 [ffff8810090d3bd0] nvmf_connect_io_queue at ffffffffa064d134 [nvme_fabrics] #16 [ffff8810090d3c80] nvme_rdma_reconnect_ctrl_work at ffffffffa065cafb [nvme_rdma] #17 [ffff8810090d3cb0] process_one_work at ffffffff810a1613 #18 [ffff8810090d3d90] worker_thread at ffffffff810a22ad #19 [ffff8810090d3ec0] kthread at ffffffff810a6dec #20 [ffff8810090d3f50] ret_from_fork at ffffffff816e3bbf crash> gdb list *nvme_rdma_queue_rq+621 0xffffffffa065ce3d is in nvme_rdma_queue_rq (drivers/nvme/host/rdma.c:1415). 1410 unsigned int map_len; 1411 int ret; 1412 1413 WARN_ON_ONCE(rq->tag < 0); 1414 1415 BUG_ON(hctx != queue->hctx); 1416 BUG_ON(test_bit(BLK_MQ_S_STOPPED, &hctx->state)); 1417 dev = queue->device->dev; 1418 ib_dma_sync_single_for_cpu(dev, sqe->dma, 1419 sizeof(struct nvme_command), DMA_TO_DEVICE); ^ permalink raw reply related [flat|nested] 53+ messages in thread
[parent not found: <da2e918b-0f18-e032-272d-368c6ec49c62@gri mberg.me>]
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-12 20:10 ` Steve Wise [not found] ` <da2e918b-0f18-e032-272d-368c6ec49c62@gri mberg.me> @ 2016-09-15 9:53 ` Sagi Grimberg 2016-09-15 14:44 ` Steve Wise 2016-09-15 14:00 ` Gabriel Krisman Bertazi 2 siblings, 1 reply; 53+ messages in thread From: Sagi Grimberg @ 2016-09-15 9:53 UTC (permalink / raw) > @@ -1408,6 +1412,8 @@ static int nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx, > > WARN_ON_ONCE(rq->tag < 0); > > + BUG_ON(hctx != queue->hctx); > + BUG_ON(test_bit(BLK_MQ_S_STOPPED, &hctx->state)); > dev = queue->device->dev; > ib_dma_sync_single_for_cpu(dev, sqe->dma, > sizeof(struct nvme_command), DMA_TO_DEVICE); > --- > > When I reran the test forcing reconnects, I hit the BUG_ON(hctx != queue->hctx) > in nvme_rdma_queue_rq() when doing the first reconnect (not when initially > connecting the targets). Here is the back trace. Is my debug logic flawed? > Or does this mean something is screwed up once we start reconnecting. This is weird indeed. The fact that you trigger this means that you successfully reconnect correct? If queue is corrupted it would explain the bogus post on a freed or non-existing qp... ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-15 9:53 ` Sagi Grimberg @ 2016-09-15 14:44 ` Steve Wise 2016-09-15 15:10 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-09-15 14:44 UTC (permalink / raw) > > @@ -1408,6 +1412,8 @@ static int nvme_rdma_queue_rq(struct blk_mq_hw_ctx > *hctx, > > > > WARN_ON_ONCE(rq->tag < 0); > > > > + BUG_ON(hctx != queue->hctx); > > + BUG_ON(test_bit(BLK_MQ_S_STOPPED, &hctx->state)); > > dev = queue->device->dev; > > ib_dma_sync_single_for_cpu(dev, sqe->dma, > > sizeof(struct nvme_command), DMA_TO_DEVICE); > > --- > > > > When I reran the test forcing reconnects, I hit the BUG_ON(hctx != queue->hctx) > > in nvme_rdma_queue_rq() when doing the first reconnect (not when initially > > connecting the targets). Here is the back trace. Is my debug logic flawed? > > Or does this mean something is screwed up once we start reconnecting. > > This is weird indeed. > > The fact that you trigger this means that you successfully reconnect > correct? > The state of the controller is NVME_CTRL_RECONNECTING. In fact, this BUG_ON() happened on the reconnect worker thread. Ah, this is probably the connect command on the admin queue maybe? PID: 1819 TASK: ffff88101d0217c0 CPU: 0 COMMAND: "kworker/0:2" #0 [ffff8810090d34b0] machine_kexec at ffffffff8105fbd0 #1 [ffff8810090d3520] __crash_kexec at ffffffff81116998 #2 [ffff8810090d35f0] crash_kexec at ffffffff81116a6d #3 [ffff8810090d3620] oops_end at ffffffff81032bd6 #4 [ffff8810090d3650] die at ffffffff810330cb #5 [ffff8810090d3680] do_trap at ffffffff8102fff1 #6 [ffff8810090d36e0] do_error_trap at ffffffff8103032d #7 [ffff8810090d37a0] do_invalid_op at ffffffff81030480 #8 [ffff8810090d37b0] invalid_op at ffffffff816e47be [exception RIP: nvme_rdma_queue_rq+621] RIP: ffffffffa065ce3d RSP: ffff8810090d3868 RFLAGS: 00010206 RAX: 0000000000000000 RBX: ffff880e33640000 RCX: dead000000000200 RDX: ffff8810090d3928 RSI: ffff8810090d38f8 RDI: ffff880e315cb528 RBP: ffff8810090d38a8 R8: ffff880e33640000 R9: 0000000000000000 R10: 0000000000000674 R11: ffff8810090d3a18 R12: ffff880e36ab91d0 R13: ffff880e33640170 R14: ffff880e315cb528 R15: ffff880e36bc1138 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff8810090d38b0] __blk_mq_run_hw_queue at ffffffff81338b1b #10 [ffff8810090d3a00] blk_mq_run_hw_queue at ffffffff81338ffe #11 [ffff8810090d3a20] blk_mq_insert_request at ffffffff8133a130 #12 [ffff8810090d3a90] blk_execute_rq_nowait at ffffffff813342dd #13 [ffff8810090d3ad0] blk_execute_rq at ffffffff8133442e #14 [ffff8810090d3b80] __nvme_submit_sync_cmd at ffffffffa02715d5 [nvme_core] #15 [ffff8810090d3bd0] nvmf_connect_io_queue at ffffffffa064d134 [nvme_fabrics] #16 [ffff8810090d3c80] nvme_rdma_reconnect_ctrl_work at ffffffffa065cafb [nvme_rdma] #17 [ffff8810090d3cb0] process_one_work at ffffffff810a1613 #18 [ffff8810090d3d90] worker_thread at ffffffff810a22ad #19 [ffff8810090d3ec0] kthread at ffffffff810a6dec #20 [ffff8810090d3f50] ret_from_fork at ffffffff816e3bbf ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-15 14:44 ` Steve Wise @ 2016-09-15 15:10 ` Steve Wise 2016-09-15 15:53 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-09-15 15:10 UTC (permalink / raw) > The state of the controller is NVME_CTRL_RECONNECTING. In fact, this BUG_ON() > happened on the reconnect worker thread. Ah, this is probably the connect > command on the admin queue maybe? > > > The queue being used at the crash is nvme_rdma_ctrl->queues[1]. IE not the admin queue. The reconnect work thread is connecting the io queues here: crash> gdb list *nvme_rdma_reconnect_ctrl_work+0x14b 0xffffffffa065cafb is in nvme_rdma_reconnect_ctrl_work (drivers/nvme/host/rdma.c:647). 642 { 643 int i, ret = 0; 644 645 for (i = 1; i < ctrl->queue_count; i++) { 646 ret = nvmf_connect_io_queue(&ctrl->ctrl, i); 647 if (ret) 648 break; 649 } 650 651 return ret; nvmf_connect_io_queue() is here: crash> gdb list *nvmf_connect_io_queue+0x114 0xffffffffa064d134 is in nvmf_connect_io_queue (drivers/nvme/host/fabrics.c:454). 449 strncpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE); 450 451 ret = __nvme_submit_sync_cmd(ctrl->connect_q, &cmd, &cqe, 452 data, sizeof(*data), 0, qid, 1, 453 BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT); 454 if (ret) { 455 nvmf_log_connect_error(ctrl, ret, le32_to_cpu(cqe.result), 456 &cmd, data); 457 } 458 kfree(data); The hctx passed into nvme_rdma_queue_rq() is in state BLK_MQ_S_TAG_ACTIVE. And hctx->driver_data is the nvme_rdma_queue to be used. That nvme_rdma_queue has a different hctx pointer (from my debug code) and that's why we hit the BUG_ON(). Anyway, nvme_rdma_queue->hctx->state is BLK_MQ_S_STOPPED. So this is more evidence that somehow an hctx is using an nvme_rdma_queue that wasn't originally assigned to that hctx... ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-15 15:10 ` Steve Wise @ 2016-09-15 15:53 ` Steve Wise 2016-09-15 16:45 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-09-15 15:53 UTC (permalink / raw) > > The hctx passed into nvme_rdma_queue_rq() is in state BLK_MQ_S_TAG_ACTIVE. > And > hctx->driver_data is the nvme_rdma_queue to be used. That nvme_rdma_queue > has a > different hctx pointer (from my debug code) and that's why we hit the BUG_ON(). > Anyway, nvme_rdma_queue->hctx->state is BLK_MQ_S_STOPPED. So this is more > evidence that somehow an hctx is using an nvme_rdma_queue that wasn't > originally > assigned to that hctx... > I added this to my debug patch. I will warn on if an hctx is initialized twice or if it a 2nd hctx is bound to an nvme_rdma_queue that is already bound to another hctx: @@ -383,7 +384,12 @@ static int nvme_rdma_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, BUG_ON(hctx_idx >= ctrl->queue_count); + dev_warn(ctrl->ctrl.device, "%s hctx_idx %u hctx %p hctx->driver_data %p queue %p queue->hctx %p\n", + __func__, hctx_idx, hctx, hctx->driver_data, queue, queue->hctx); + WARN_ON_ONCE(hctx->driver_data); hctx->driver_data = queue; + WARN_ON_ONCE(queue->hctx); + queue->hctx = hctx; return 0; } And I see that 2 sets of blk_mq_hw_ctx structs get assigned to the same 32 queues. Here is the output for 1 target connect with 32 cores. So is it expected that the 32 nvme_rdma IO queues get assigned to 2 sets of hw_ctx structs? The 2nd set is getting initialized as part of namespace scanning... [ 652.782267] nvme nvme1: creating 32 I/O queues. [ 653.373979] nvme nvme1: nvme_rdma_init_hctx hctx_idx 0 hctx ffff880e75dd0ff8 hctx->driver_data (null) queue ffff880fce71e860 queue->hctx (null) [ 653.394416] nvme nvme1: nvme_rdma_init_hctx hctx_idx 1 hctx ffff880e75dd6a48 hctx->driver_data (null) queue ffff880fce71e8d8 queue->hctx (null) [ 653.414415] nvme nvme1: nvme_rdma_init_hctx hctx_idx 2 hctx ffff880e75dd1548 hctx->driver_data (null) queue ffff880fce71e950 queue->hctx (null) [ 653.434027] nvme nvme1: nvme_rdma_init_hctx hctx_idx 3 hctx ffff880e75dd64f8 hctx->driver_data (null) queue ffff880fce71e9c8 queue->hctx (null) [ 653.453489] nvme nvme1: nvme_rdma_init_hctx hctx_idx 4 hctx ffff880e75dd5fa8 hctx->driver_data (null) queue ffff880fce71ea40 queue->hctx (null) [ 653.472782] nvme nvme1: nvme_rdma_init_hctx hctx_idx 5 hctx ffff880e75dd1a98 hctx->driver_data (null) queue ffff880fce71eab8 queue->hctx (null) [ 653.491934] nvme nvme1: nvme_rdma_init_hctx hctx_idx 6 hctx ffff880e75dd1fe8 hctx->driver_data (null) queue ffff880fce71eb30 queue->hctx (null) [ 653.510957] nvme nvme1: nvme_rdma_init_hctx hctx_idx 7 hctx ffff880e75dd5a58 hctx->driver_data (null) queue ffff880fce71eba8 queue->hctx (null) [ 653.530070] nvme nvme1: nvme_rdma_init_hctx hctx_idx 8 hctx ffff880e75dd5508 hctx->driver_data (null) queue ffff880fce71ec20 queue->hctx (null) [ 653.551448] nvme nvme1: nvme_rdma_init_hctx hctx_idx 9 hctx ffff880e75dd2538 hctx->driver_data (null) queue ffff880fce71ec98 queue->hctx (null) [ 653.572524] nvme nvme1: nvme_rdma_init_hctx hctx_idx 10 hctx ffff880e75dd4fb8 hctx->driver_data (null) queue ffff880fce71ed10 queue->hctx (null) [ 653.593434] nvme nvme1: nvme_rdma_init_hctx hctx_idx 11 hctx ffff880e75dd4a68 hctx->driver_data (null) queue ffff880fce71ed88 queue->hctx (null) [ 653.614228] nvme nvme1: nvme_rdma_init_hctx hctx_idx 12 hctx ffff880e743e0008 hctx->driver_data (null) queue ffff880fce71ee00 queue->hctx (null) [ 653.634809] nvme nvme1: nvme_rdma_init_hctx hctx_idx 13 hctx ffff880e743e7a38 hctx->driver_data (null) queue ffff880fce71ee78 queue->hctx (null) [ 653.655223] nvme nvme1: nvme_rdma_init_hctx hctx_idx 14 hctx ffff880e743e74e8 hctx->driver_data (null) queue ffff880fce71eef0 queue->hctx (null) [ 653.675474] nvme nvme1: nvme_rdma_init_hctx hctx_idx 15 hctx ffff880e743e0558 hctx->driver_data (null) queue ffff880fce71ef68 queue->hctx (null) [ 653.695558] nvme nvme1: nvme_rdma_init_hctx hctx_idx 16 hctx ffff880e743e0aa8 hctx->driver_data (null) queue ffff880fce71efe0 queue->hctx (null) [ 653.715577] nvme nvme1: nvme_rdma_init_hctx hctx_idx 17 hctx ffff880e743e6f98 hctx->driver_data (null) queue ffff880fce71f058 queue->hctx (null) [ 653.735383] nvme nvme1: nvme_rdma_init_hctx hctx_idx 18 hctx ffff880e743e6a48 hctx->driver_data (null) queue ffff880fce71f0d0 queue->hctx (null) [ 653.755053] nvme nvme1: nvme_rdma_init_hctx hctx_idx 19 hctx ffff880e743e0ff8 hctx->driver_data (null) queue ffff880fce71f148 queue->hctx (null) [ 653.774627] nvme nvme1: nvme_rdma_init_hctx hctx_idx 20 hctx ffff880e743e1548 hctx->driver_data (null) queue ffff880fce71f1c0 queue->hctx (null) [ 653.794081] nvme nvme1: nvme_rdma_init_hctx hctx_idx 21 hctx ffff880e743e64f8 hctx->driver_data (null) queue ffff880fce71f238 queue->hctx (null) [ 653.813415] nvme nvme1: nvme_rdma_init_hctx hctx_idx 22 hctx ffff880e743e5fa8 hctx->driver_data (null) queue ffff880fce71f2b0 queue->hctx (null) [ 653.832665] nvme nvme1: nvme_rdma_init_hctx hctx_idx 23 hctx ffff880e743e1a98 hctx->driver_data (null) queue ffff880fce71f328 queue->hctx (null) [ 653.851791] nvme nvme1: nvme_rdma_init_hctx hctx_idx 24 hctx ffff880e743e1fe8 hctx->driver_data (null) queue ffff880fce71f3a0 queue->hctx (null) [ 653.870780] nvme nvme1: nvme_rdma_init_hctx hctx_idx 25 hctx ffff880e743e5a58 hctx->driver_data (null) queue ffff880fce71f418 queue->hctx (null) [ 653.889632] nvme nvme1: nvme_rdma_init_hctx hctx_idx 26 hctx ffff880e743e5508 hctx->driver_data (null) queue ffff880fce71f490 queue->hctx (null) [ 653.908318] nvme nvme1: nvme_rdma_init_hctx hctx_idx 27 hctx ffff880e743e2538 hctx->driver_data (null) queue ffff880fce71f508 queue->hctx (null) [ 653.927036] nvme nvme1: nvme_rdma_init_hctx hctx_idx 28 hctx ffff880e743e2a88 hctx->driver_data (null) queue ffff880fce71f580 queue->hctx (null) [ 653.945760] nvme nvme1: nvme_rdma_init_hctx hctx_idx 29 hctx ffff880e743e4fb8 hctx->driver_data (null) queue ffff880fce71f5f8 queue->hctx (null) [ 653.964475] nvme nvme1: nvme_rdma_init_hctx hctx_idx 30 hctx ffff880e743e4a68 hctx->driver_data (null) queue ffff880fce71f670 queue->hctx (null) [ 653.983174] nvme nvme1: nvme_rdma_init_hctx hctx_idx 31 hctx ffff880e743e2fd8 hctx->driver_data (null) queue ffff880fce71f6e8 queue->hctx (null) [ 654.192720] nvme nvme1: new ctrl: NQN "test-ram0", addr 10.0.1.14:4420 [ 654.205669] nvme nvme1: nvme_rdma_init_hctx hctx_idx 0 hctx ffff880e75dd2a88 hctx->driver_data (null) queue ffff880fce71e860 queue->hctx ffff880e75dd0ff8 [ 654.228208] ------------[ cut here ]------------ [ 654.236514] WARNING: CPU: 31 PID: 279 at drivers/nvme/host/rdma.c:391 nvme_rdma_init_hctx+0xb7/0xe0 [nvme_rdma] [ 654.250060] Modules linked in: nvme_rdma nvme_fabrics brd iw_cxgb4 cxgb4 ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge 8021q mrp garp stp llc cachefiles fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad ocrdma be2net iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx5_ib mlx5_core mlx4_ib mlx4_en mlx4_core ib_mthca ib_core binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm irqbypass uinput iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr dm_mod i2c_i801 i2c_smbus sg lpc_ich mfd_core mei_me mei nvme nvme_core igb dca ptp pps_core ipmi_si ipmi_msghandler wmi ext4(E) mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) libata(E) mgag200(E) ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E) i2c_core(E) [last unloaded: cxgb4] [ 654.368142] CPU: 31 PID: 279 Comm: kworker/31:1 Tainted: G E 4.8.0-rc5-nvmf+block2-dbg+ #34 [ 654.379719] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015 [ 654.388779] Workqueue: events nvme_scan_work [nvme_core] [ 654.396097] 0000000000000000 ffff881020dbb8b8 ffffffff8135b8b7 0000000000000117 [ 654.405476] 0000000000000000 0000000000000000 0000000000000000 ffff881020dbb908 [ 654.414792] ffffffff81086eed ffff881020dbb8e8 00000187d3748534 ffffffffa066d200 [ 654.424081] Call Trace: [ 654.428333] [<ffffffff8135b8b7>] dump_stack+0x67/0x90 [ 654.435266] [<ffffffff81086eed>] __warn+0xfd/0x120 [ 654.441894] [<ffffffff81086f2d>] warn_slowpath_null+0x1d/0x20 [ 654.449447] [<ffffffffa0669407>] nvme_rdma_init_hctx+0xb7/0xe0 [nvme_rdma] [ 654.458118] [<ffffffff8133a52e>] blk_mq_init_hctx+0x21e/0x2e0 [ 654.465625] [<ffffffff8133a6ea>] blk_mq_realloc_hw_ctxs+0xfa/0x240 [ 654.473542] [<ffffffff8133b342>] blk_mq_init_allocated_queue+0x92/0x410 [ 654.481885] [<ffffffff8132a969>] ? blk_alloc_queue_node+0x259/0x2c0 [ 654.489833] [<ffffffff8135ce84>] ? ida_pre_get+0xb4/0xe0 [ 654.496798] [<ffffffff8133b6ff>] blk_mq_init_queue+0x3f/0x70 [ 654.504097] [<ffffffffa0272998>] nvme_alloc_ns+0x88/0x240 [nvme_core] [ 654.512155] [<ffffffffa02728bc>] ? nvme_find_get_ns+0x5c/0xb0 [nvme_core] [ 654.520533] [<ffffffffa0273059>] nvme_validate_ns+0x79/0x90 [nvme_core] [ 654.528718] [<ffffffffa0273166>] nvme_scan_ns_list+0xf6/0x1f0 [nvme_core] [ 654.537059] [<ffffffffa027338b>] nvme_scan_work+0x12b/0x140 [nvme_core] [ 654.545199] [<ffffffff810a1613>] process_one_work+0x183/0x4d0 [ 654.552448] [<ffffffff816dfa40>] ? __schedule+0x1f0/0x5b0 [ 654.559314] [<ffffffff816dff00>] ? schedule+0x40/0xb0 [ 654.565810] [<ffffffff810a22ad>] worker_thread+0x16d/0x530 [ 654.572709] [<ffffffff810a2140>] ? maybe_create_worker+0x120/0x120 [ 654.580279] [<ffffffff816dfa40>] ? __schedule+0x1f0/0x5b0 [ 654.587061] [<ffffffff810cbab6>] ? __wake_up_common+0x56/0x90 [ 654.594146] [<ffffffff810a2140>] ? maybe_create_worker+0x120/0x120 [ 654.601644] [<ffffffff816dff00>] ? schedule+0x40/0xb0 [ 654.607998] [<ffffffff810a2140>] ? maybe_create_worker+0x120/0x120 [ 654.615459] [<ffffffff810a6dec>] kthread+0xcc/0xf0 [ 654.621505] [<ffffffff810b17ae>] ? schedule_tail+0x1e/0xc0 [ 654.628218] [<ffffffff816e3bbf>] ret_from_fork+0x1f/0x40 [ 654.634738] [<ffffffff810a6d20>] ? kthread_freezable_should_stop+0x70/0x70 [ 654.642845] ---[ end trace 83cb452b9aa631ae ]--- [ 654.648809] nvme nvme1: nvme_rdma_init_hctx hctx_idx 1 hctx ffff880e75dd2fd8 hctx->driver_data (null) queue ffff880fce71e8d8 queue->hctx ffff880e75dd6a48 [ 654.665803] nvme nvme1: nvme_rdma_init_hctx hctx_idx 2 hctx ffff880e75dd4518 hctx->driver_data (null) queue ffff880fce71e950 queue->hctx ffff880e75dd1548 [ 654.682808] nvme nvme1: nvme_rdma_init_hctx hctx_idx 3 hctx ffff880e75dd3528 hctx->driver_data (null) queue ffff880fce71e9c8 queue->hctx ffff880e75dd64f8 [ 654.699807] nvme nvme1: nvme_rdma_init_hctx hctx_idx 4 hctx ffff880e75dd3a78 hctx->driver_data (null) queue ffff880fce71ea40 queue->hctx ffff880e75dd5fa8 [ 654.716822] nvme nvme1: nvme_rdma_init_hctx hctx_idx 5 hctx ffff880ff084ea48 hctx->driver_data (null) queue ffff880fce71eab8 queue->hctx ffff880e75dd1a98 [ 654.733809] nvme nvme1: nvme_rdma_init_hctx hctx_idx 6 hctx ffff880e71c78008 hctx->driver_data (null) queue ffff880fce71eb30 queue->hctx ffff880e75dd1fe8 [ 654.750808] nvme nvme1: nvme_rdma_init_hctx hctx_idx 7 hctx ffff880e71c7fa38 hctx->driver_data (null) queue ffff880fce71eba8 queue->hctx ffff880e75dd5a58 [ 654.767860] nvme nvme1: nvme_rdma_init_hctx hctx_idx 8 hctx ffff880e71c7f4e8 hctx->driver_data (null) queue ffff880fce71ec20 queue->hctx ffff880e75dd5508 [ 654.785002] nvme nvme1: nvme_rdma_init_hctx hctx_idx 9 hctx ffff880e71c78558 hctx->driver_data (null) queue ffff880fce71ec98 queue->hctx ffff880e75dd2538 [ 654.802337] nvme nvme1: nvme_rdma_init_hctx hctx_idx 10 hctx ffff880e71c78aa8 hctx->driver_data (null) queue ffff880fce71ed10 queue->hctx ffff880e75dd4fb8 [ 654.819704] nvme nvme1: nvme_rdma_init_hctx hctx_idx 11 hctx ffff880e71c7ef98 hctx->driver_data (null) queue ffff880fce71ed88 queue->hctx ffff880e75dd4a68 [ 654.837312] nvme nvme1: nvme_rdma_init_hctx hctx_idx 12 hctx ffff880e71c7ea48 hctx->driver_data (null) queue ffff880fce71ee00 queue->hctx ffff880e743e0008 [ 654.854892] nvme nvme1: nvme_rdma_init_hctx hctx_idx 13 hctx ffff880e71c78ff8 hctx->driver_data (null) queue ffff880fce71ee78 queue->hctx ffff880e743e7a38 [ 654.872556] nvme nvme1: nvme_rdma_init_hctx hctx_idx 14 hctx ffff880e71c79548 hctx->driver_data (null) queue ffff880fce71eef0 queue->hctx ffff880e743e74e8 [ 654.890332] nvme nvme1: nvme_rdma_init_hctx hctx_idx 15 hctx ffff880e71c7e4f8 hctx->driver_data (null) queue ffff880fce71ef68 queue->hctx ffff880e743e0558 [ 654.908328] nvme nvme1: nvme_rdma_init_hctx hctx_idx 16 hctx ffff880e71c7dfa8 hctx->driver_data (null) queue ffff880fce71efe0 queue->hctx ffff880e743e0aa8 [ 654.926268] nvme nvme1: nvme_rdma_init_hctx hctx_idx 17 hctx ffff880e71c79a98 hctx->driver_data (null) queue ffff880fce71f058 queue->hctx ffff880e743e6f98 [ 654.944313] nvme nvme1: nvme_rdma_init_hctx hctx_idx 18 hctx ffff880e71c7da58 hctx->driver_data (null) queue ffff880fce71f0d0 queue->hctx ffff880e743e6a48 [ 654.962455] nvme nvme1: nvme_rdma_init_hctx hctx_idx 19 hctx ffff880e71c79fe8 hctx->driver_data (null) queue ffff880fce71f148 queue->hctx ffff880e743e0ff8 [ 654.980701] nvme nvme1: nvme_rdma_init_hctx hctx_idx 20 hctx ffff880e71c7a538 hctx->driver_data (null) queue ffff880fce71f1c0 queue->hctx ffff880e743e1548 [ 654.999107] nvme nvme1: nvme_rdma_init_hctx hctx_idx 21 hctx ffff880e71c7d508 hctx->driver_data (null) queue ffff880fce71f238 queue->hctx ffff880e743e64f8 [ 655.017552] nvme nvme1: nvme_rdma_init_hctx hctx_idx 22 hctx ffff880e71c7aa88 hctx->driver_data (null) queue ffff880fce71f2b0 queue->hctx ffff880e743e5fa8 [ 655.036101] nvme nvme1: nvme_rdma_init_hctx hctx_idx 23 hctx ffff880e71c7afd8 hctx->driver_data (null) queue ffff880fce71f328 queue->hctx ffff880e743e1a98 [ 655.054759] nvme nvme1: nvme_rdma_init_hctx hctx_idx 24 hctx ffff880e71c7cfb8 hctx->driver_data (null) queue ffff880fce71f3a0 queue->hctx ffff880e743e1fe8 [ 655.073480] nvme nvme1: nvme_rdma_init_hctx hctx_idx 25 hctx ffff880e71c7ca68 hctx->driver_data (null) queue ffff880fce71f418 queue->hctx ffff880e743e5a58 [ 655.092201] nvme nvme1: nvme_rdma_init_hctx hctx_idx 26 hctx ffff880e71c7b528 hctx->driver_data (null) queue ffff880fce71f490 queue->hctx ffff880e743e5508 [ 655.110921] nvme nvme1: nvme_rdma_init_hctx hctx_idx 27 hctx ffff880e71c7ba78 hctx->driver_data (null) queue ffff880fce71f508 queue->hctx ffff880e743e2538 [ 655.129659] nvme nvme1: nvme_rdma_init_hctx hctx_idx 28 hctx ffff880e71c7c518 hctx->driver_data (null) queue ffff880fce71f580 queue->hctx ffff880e743e2a88 [ 655.148357] nvme nvme1: nvme_rdma_init_hctx hctx_idx 29 hctx ffff880e71c7bfc8 hctx->driver_data (null) queue ffff880fce71f5f8 queue->hctx ffff880e743e4fb8 [ 655.167104] nvme nvme1: nvme_rdma_init_hctx hctx_idx 30 hctx ffff880e71cc8008 hctx->driver_data (null) queue ffff880fce71f670 queue->hctx ffff880e743e4a68 [ 655.185814] nvme nvme1: nvme_rdma_init_hctx hctx_idx 31 hctx ffff880e71cc8558 hctx->driver_data (null) queue ffff880fce71f6e8 queue->hctx ffff880e743e2fd8 ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-15 15:53 ` Steve Wise @ 2016-09-15 16:45 ` Steve Wise 2016-09-15 20:58 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-09-15 16:45 UTC (permalink / raw) > And I see that 2 sets of blk_mq_hw_ctx structs get assigned to the same 32 > queues. Here is the output for 1 target connect with 32 cores. So is it > expected that the 32 nvme_rdma IO queues get assigned to 2 sets of hw_ctx > structs? The 2nd set is getting initialized as part of namespace scanning... So here is the stack for the first time the nvme_rdma_queue structs are bound to an hctx: [ 2006.826941] [<ffffffffa066c452>] nvme_rdma_init_hctx+0x102/0x110 [nvme_rdma] [ 2006.835409] [<ffffffff8133a52e>] blk_mq_init_hctx+0x21e/0x2e0 [ 2006.842530] [<ffffffff8133a6ea>] blk_mq_realloc_hw_ctxs+0xfa/0x240 [ 2006.850097] [<ffffffff8133b342>] blk_mq_init_allocated_queue+0x92/0x410 [ 2006.858107] [<ffffffff8132a969>] ? blk_alloc_queue_node+0x259/0x2c0 [ 2006.865765] [<ffffffff8133b6ff>] blk_mq_init_queue+0x3f/0x70 [ 2006.872829] [<ffffffffa066d9f9>] nvme_rdma_create_io_queues+0x189/0x210 [nvme_rdma] [ 2006.881917] [<ffffffffa066e813>] ? nvme_rdma_configure_admin_queue+0x1e3/0x290 [nvme_rdma] [ 2006.891611] [<ffffffffa066ec65>] nvme_rdma_create_ctrl+0x3a5/0x4c0 [nvme_rdma] [ 2006.900260] [<ffffffffa0654d33>] ? nvmf_create_ctrl+0x33/0x210 [nvme_fabrics] [ 2006.908799] [<ffffffffa0654e82>] nvmf_create_ctrl+0x182/0x210 [nvme_fabrics] [ 2006.917228] [<ffffffffa0654fbc>] nvmf_dev_write+0xac/0x110 [nvme_fabrics] And here is the 2nd time the same nvme_rdma_queue is bound to a different hctx:: [ 2007.263068] [<ffffffffa066c40c>] nvme_rdma_init_hctx+0xbc/0x110 [nvme_rdma] [ 2007.271656] [<ffffffff8133a52e>] blk_mq_init_hctx+0x21e/0x2e0 [ 2007.279027] [<ffffffff8133a6ea>] blk_mq_realloc_hw_ctxs+0xfa/0x240 [ 2007.286829] [<ffffffff8133b342>] blk_mq_init_allocated_queue+0x92/0x410 [ 2007.295066] [<ffffffff8132a969>] ? blk_alloc_queue_node+0x259/0x2c0 [ 2007.302962] [<ffffffff8135ce84>] ? ida_pre_get+0xb4/0xe0 [ 2007.309894] [<ffffffff8133b6ff>] blk_mq_init_queue+0x3f/0x70 [ 2007.317164] [<ffffffffa0272998>] nvme_alloc_ns+0x88/0x240 [nvme_core] [ 2007.325218] [<ffffffffa02728bc>] ? nvme_find_get_ns+0x5c/0xb0 [nvme_core] [ 2007.333612] [<ffffffffa0273059>] nvme_validate_ns+0x79/0x90 [nvme_core] [ 2007.341825] [<ffffffffa0273166>] nvme_scan_ns_list+0xf6/0x1f0 [nvme_core] [ 2007.350214] [<ffffffffa027338b>] nvme_scan_work+0x12b/0x140 [nvme_core] [ 2007.358427] [<ffffffff810a1613>] process_one_work+0x183/0x4d0 Is this expected? ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-15 16:45 ` Steve Wise @ 2016-09-15 20:58 ` Steve Wise 2016-09-16 11:04 ` 'Christoph Hellwig' 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-09-15 20:58 UTC (permalink / raw) > > And I see that 2 sets of blk_mq_hw_ctx structs get assigned to the same 32 > > queues. Here is the output for 1 target connect with 32 cores. So is it > > expected that the 32 nvme_rdma IO queues get assigned to 2 sets of hw_ctx > > structs? The 2nd set is getting initialized as part of namespace scanning... > > > So here is the stack for the first time the nvme_rdma_queue structs are bound to > an hctx: > > [ 2006.826941] [<ffffffffa066c452>] nvme_rdma_init_hctx+0x102/0x110 > [nvme_rdma] > [ 2006.835409] [<ffffffff8133a52e>] blk_mq_init_hctx+0x21e/0x2e0 > [ 2006.842530] [<ffffffff8133a6ea>] blk_mq_realloc_hw_ctxs+0xfa/0x240 > [ 2006.850097] [<ffffffff8133b342>] blk_mq_init_allocated_queue+0x92/0x410 > [ 2006.858107] [<ffffffff8132a969>] ? blk_alloc_queue_node+0x259/0x2c0 > [ 2006.865765] [<ffffffff8133b6ff>] blk_mq_init_queue+0x3f/0x70 > [ 2006.872829] [<ffffffffa066d9f9>] nvme_rdma_create_io_queues+0x189/0x210 > [nvme_rdma] > [ 2006.881917] [<ffffffffa066e813>] ? > nvme_rdma_configure_admin_queue+0x1e3/0x290 [nvme_rdma] > [ 2006.891611] [<ffffffffa066ec65>] nvme_rdma_create_ctrl+0x3a5/0x4c0 > [nvme_rdma] > [ 2006.900260] [<ffffffffa0654d33>] ? nvmf_create_ctrl+0x33/0x210 > [nvme_fabrics] > [ 2006.908799] [<ffffffffa0654e82>] nvmf_create_ctrl+0x182/0x210 [nvme_fabrics] > [ 2006.917228] [<ffffffffa0654fbc>] nvmf_dev_write+0xac/0x110 [nvme_fabrics] > The above stack is creating hctx queues for the nvme_rdma_ctrl->ctrl.connect_q request queue. > And here is the 2nd time the same nvme_rdma_queue is bound to a different hctx:: > > [ 2007.263068] [<ffffffffa066c40c>] nvme_rdma_init_hctx+0xbc/0x110 > [nvme_rdma] > [ 2007.271656] [<ffffffff8133a52e>] blk_mq_init_hctx+0x21e/0x2e0 > [ 2007.279027] [<ffffffff8133a6ea>] blk_mq_realloc_hw_ctxs+0xfa/0x240 > [ 2007.286829] [<ffffffff8133b342>] blk_mq_init_allocated_queue+0x92/0x410 > [ 2007.295066] [<ffffffff8132a969>] ? blk_alloc_queue_node+0x259/0x2c0 > [ 2007.302962] [<ffffffff8135ce84>] ? ida_pre_get+0xb4/0xe0 > [ 2007.309894] [<ffffffff8133b6ff>] blk_mq_init_queue+0x3f/0x70 > [ 2007.317164] [<ffffffffa0272998>] nvme_alloc_ns+0x88/0x240 [nvme_core] > [ 2007.325218] [<ffffffffa02728bc>] ? nvme_find_get_ns+0x5c/0xb0 [nvme_core] > [ 2007.333612] [<ffffffffa0273059>] nvme_validate_ns+0x79/0x90 [nvme_core] > [ 2007.341825] [<ffffffffa0273166>] nvme_scan_ns_list+0xf6/0x1f0 [nvme_core] > [ 2007.350214] [<ffffffffa027338b>] nvme_scan_work+0x12b/0x140 [nvme_core] > [ 2007.358427] [<ffffffff810a1613>] process_one_work+0x183/0x4d0 > This stack is creating hctx queues for the namespace created for this target device. Sagi, Should nvme_rdma_error_recovery_work() be stopping the hctx queues for ctrl->ctrl.connect_q too? Something like: @@ -781,6 +790,7 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work) if (ctrl->queue_count > 1) nvme_stop_queues(&ctrl->ctrl); blk_mq_stop_hw_queues(ctrl->ctrl.admin_q); + blk_mq_stop_hw_queues(ctrl->ctrl.connect_q); /* We must take care of fastfail/requeue all our inflight requests */ if (ctrl->queue_count > 1) And then restart these after the nvme_rdma_queue rdma resources are reallocated? ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-15 20:58 ` Steve Wise @ 2016-09-16 11:04 ` 'Christoph Hellwig' 2016-09-18 17:02 ` Sagi Grimberg 0 siblings, 1 reply; 53+ messages in thread From: 'Christoph Hellwig' @ 2016-09-16 11:04 UTC (permalink / raw) On Thu, Sep 15, 2016@03:58:25PM -0500, Steve Wise wrote: > This stack is creating hctx queues for the namespace created for this target > device. > > Sagi, > > Should nvme_rdma_error_recovery_work() be stopping the hctx queues for > ctrl->ctrl.connect_q too? Oh. Actually we'll probably need to take care of the connect_q just about anywhere we do anything to the other queues.. Let me look into the details a bit more. ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-16 11:04 ` 'Christoph Hellwig' @ 2016-09-18 17:02 ` Sagi Grimberg 2016-09-19 15:38 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Sagi Grimberg @ 2016-09-18 17:02 UTC (permalink / raw) >> This stack is creating hctx queues for the namespace created for this target >> device. >> >> Sagi, >> >> Should nvme_rdma_error_recovery_work() be stopping the hctx queues for >> ctrl->ctrl.connect_q too? > > Oh. Actually we'll probably need to take care of the connect_q just > about anywhere we do anything to the other queues.. Why should we? We control the IOs on the connect_q (we only submit connect to it) and we only submit to it if our queue is established. I still don't see how this explains why Steves is seeing bogus queue/hctx mappings... ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-18 17:02 ` Sagi Grimberg @ 2016-09-19 15:38 ` Steve Wise 2016-09-21 21:20 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-09-19 15:38 UTC (permalink / raw) > >> This stack is creating hctx queues for the namespace created for this target > >> device. > >> > >> Sagi, > >> > >> Should nvme_rdma_error_recovery_work() be stopping the hctx queues for > >> ctrl->ctrl.connect_q too? > > > > Oh. Actually we'll probably need to take care of the connect_q just > > about anywhere we do anything to the other queues.. > > Why should we? > > We control the IOs on the connect_q (we only submit connect to it) and > we only submit to it if our queue is established. > > I still don't see how this explains why Steves is seeing bogus > queue/hctx mappings... I don't think I'm seeing bogus mappings necessarily. I think my debug code uncovered (to me at least) that connect_q hctx's use the same nvme_rdma_queues as the ioq hctxs. And I thought that was not a valid configuration, but apparently its normal. So I still don't know how/why a pending request gets run on an nvme_rdma_queue that has blown away its rdma qp and cm_id. It _could_ be due to queue/hctx bogus mappings, but I haven't proven it. I'm not sure how to prove it (or how to further debug this issue)... ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-19 15:38 ` Steve Wise @ 2016-09-21 21:20 ` Steve Wise 2016-09-23 23:57 ` Sagi Grimberg 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-09-21 21:20 UTC (permalink / raw) > > > > > > Oh. Actually we'll probably need to take care of the connect_q just > > > about anywhere we do anything to the other queues.. > > > > Why should we? > > > > We control the IOs on the connect_q (we only submit connect to it) and > > we only submit to it if our queue is established. > > > > I still don't see how this explains why Steves is seeing bogus > > queue/hctx mappings... > > I don't think I'm seeing bogus mappings necessarily. I think my debug code > uncovered (to me at least) that connect_q hctx's use the same > nvme_rdma_queues > as the ioq hctxs. And I thought that was not a valid configuration, but > apparently its normal. So I still don't know how/why a pending request gets > run > on an nvme_rdma_queue that has blown away its rdma qp and cm_id. It > _could_ be > due to queue/hctx bogus mappings, but I haven't proven it. I'm not sure > how to > prove it (or how to further debug this issue)... I added debug code to save off the 2 blk_mq_hw_ctx pointers that get associated with each nvme_rdma_queue. This allows me to assert that the hctx passed into nvme_rdma_queue_rq() is not bogus. And indeed the hctx passed in during the crash is the correct hctx. So we know there isn't a problem with a bogus hctx being used. The hctx.state has BLK_MQ_S_TAG_ACTIVE set and _not_ BLK_MQ_S_STOPPED. The ns->queue->queue_flags has QUEUE_FLAG_STOPPED bit set. So the blk_mq queue is active and the nvme queue is STOPPED. I don't know how it gets in this state... Steve. ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-21 21:20 ` Steve Wise @ 2016-09-23 23:57 ` Sagi Grimberg 2016-09-26 15:12 ` 'Christoph Hellwig' 2016-09-27 14:07 ` Steve Wise 0 siblings, 2 replies; 53+ messages in thread From: Sagi Grimberg @ 2016-09-23 23:57 UTC (permalink / raw) > The hctx.state has BLK_MQ_S_TAG_ACTIVE set and _not_ BLK_MQ_S_STOPPED. The > ns->queue->queue_flags has QUEUE_FLAG_STOPPED bit set. So the blk_mq queue > is active and the nvme queue is STOPPED. I don't know how it gets in this > state... Christoph, I'm still trying to understand how it is possible to get to a point where the request queue is stopped while the hardware context is not... The code in rdma.c seems to do the right thing, but somehow a stray request sneaks in to our submission path when its not expected to. Steve, is the request a normal read/write? or is it something else triggered from the reconnect flow? ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-23 23:57 ` Sagi Grimberg @ 2016-09-26 15:12 ` 'Christoph Hellwig' 2016-09-26 22:29 ` 'Christoph Hellwig' 2016-09-27 14:07 ` Steve Wise 1 sibling, 1 reply; 53+ messages in thread From: 'Christoph Hellwig' @ 2016-09-26 15:12 UTC (permalink / raw) On Fri, Sep 23, 2016@04:57:21PM -0700, Sagi Grimberg wrote: > I'm still trying to understand how it is possible to > get to a point where the request queue is stopped while > the hardware context is not... Not sure either, I need to deep dive into the code. But just a week or two ago Bart posted some fixes in this area for SRP, so that's where I will start my deep dive. ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-26 15:12 ` 'Christoph Hellwig' @ 2016-09-26 22:29 ` 'Christoph Hellwig' 2016-09-27 15:11 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: 'Christoph Hellwig' @ 2016-09-26 22:29 UTC (permalink / raw) Steve, can you retest with the "Introduce blk_quiesce_queue() and blk_resume_queue()" series from Bart applied? ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-26 22:29 ` 'Christoph Hellwig' @ 2016-09-27 15:11 ` Steve Wise 2016-09-27 15:31 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-09-27 15:11 UTC (permalink / raw) > > Steve, > > can you retest with the "Introduce blk_quiesce_queue() and blk_resume_queue()" > series from Bart applied? > Hey Christoph, To apply Bart's series, I needed to use Jens' for-4.9/block branch. But I also wanted the latest nvme fixes in linux-4.8-rc8, so I rebased Jens' branch onto rc8, then applied Bart's series (which needed a small tweak to patch 2). On top of this I have some debug patches that will BUG_ON() if it detects freed RDMA objects (requires mem debug on so freed memory has the 0x6b6b... stamp). This code base can be perused at: https://github.com/larrystevenwise/nvme-fabrics/commits/block-for-4.9 I then tried to reproduce, and still hit a crash. I'm debugging now. [ 414.649731] nvme_rdma: nvme_rdma_queue_rq BAZINGA! hctx ffff880fb8fc1548 ns ffff88101f049798 queue ffff880fbc1443a8 hctx1 ffff880fba47ca68 hctx2 ffff880fb8fc1548 rq ffff880fbb1736c0 req ffff880fbb173830 [ 414.649830] nvme_rdma: nvme_rdma_queue_rq BAZINGA! hctx ffff880fbc1164f8 ns ffff88101f049798 queue ffff880fbc144328 hctx1 ffff880fba47afd8 hctx2 ffff880fbc1164f8 rq ffff880fbb031240 req ffff880fbb0313b0 [ 414.649869] ------------[ cut here ]------------ [ 414.649870] kernel BUG at drivers/nvme/host/rdma.c:1434! [ 414.649872] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 414.649907] Modules linked in: nvme_rdma nvme_fabrics brd iw_cxgb4 cxgb4 ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge 8021q mrp garp stp llc cachefiles fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad ocrdma be2net iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx5_ib mlx5_core mlx4_ib mlx4_en mlx4_core ib_mthca ib_core binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm irqbypass uinput iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr ipmi_si ipmi_msghandler dm_mod i2c_i801 i2c_smbus sg lpc_ich mfd_core mei_me mei nvme nvme_core igb dca ptp [ 414.649916] pps_core wmi ext4(E) mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) libata(E) mgag200(E) ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E) i2c_core(E) [last unloaded: cxgb4] [ 414.649918] CPU: 0 PID: 14173 Comm: fio Tainted: G W E 4.8.0-rc8-stevo-for-harsha-dbg+ #48 [ 414.649919] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015 [ 414.649920] task: ffff880fa4410040 task.stack: ffff880fa44fc000 [ 414.649926] RIP: 0010:[<ffffffffa065eef0>] [<ffffffffa065eef0>] nvme_rdma_queue_rq+0x2d0/0x360 [nvme_rdma] [ 414.649927] RSP: 0018:ffff880fa44ff9a8 EFLAGS: 00010296 [ 414.649928] RAX: 00000000000000be RBX: ffff880fbc144328 RCX: 0000000000000000 [ 414.649929] RDX: 0000000000000001 RSI: 0000000000000286 RDI: 0000000000000286 [ 414.649930] RBP: ffff880fa44ffa08 R08: 000000000005aeb2 R09: ffffffff820360de [ 414.649931] R10: 00000000000000e0 R11: 000000000000000f R12: ffff880fbb031240 [ 414.649932] R13: ffff880fbc1164f8 R14: ffff880fbb0313b0 R15: ffff88101f049798 [ 414.649933] FS: 00007ffb2afc5720(0000) GS:ffff881036e00000(0000) knlGS:0000000000000000 [ 414.649934] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 414.649935] CR2: 00007ffb0942e194 CR3: 0000000fa3636000 CR4: 00000000000406f0 [ 414.649936] Stack: [ 414.649939] ffff880fbc1164f8 ffff880fbb031240 ffff880fbb0313b0 ffff88103681f3f8 [ 414.649941] ffff880fa44ffa08 ffff880fbb7dcc88 ffff880fbb031240 ffff880fbb031240 [ 414.649942] ffff880fa44ffa78 0000000000000000 ffff880fb8f073e0 0000000000000001 [ 414.649943] Call Trace: [ 414.649948] [<ffffffff813394d1>] __blk_mq_run_hw_queue+0x251/0x400 [ 414.649952] [<ffffffff8115aa58>] ? trace_buffer_unlock_commit_regs+0x68/0x80 [ 414.649956] [<ffffffff8116c796>] ? trace_event_buffer_commit+0x146/0x1d0 [ 414.649959] [<ffffffff81330676>] ? trace_event_raw_event_block_rq+0x116/0x180 [ 414.649961] [<ffffffff81339742>] blk_mq_run_hw_queue+0x72/0x80 [ 414.649962] [<ffffffff81339cd2>] blk_mq_insert_requests+0x112/0x1d0 [ 414.649965] [<ffffffff81197991>] ? __generic_file_write_iter+0xd1/0x1f0 [ 414.649969] [<ffffffff81204cdc>] ? alloc_debug_processing+0x6c/0x1a0 [ 414.649971] [<ffffffff81339f16>] blk_mq_flush_plug_list+0x186/0x1a0 [ 414.649974] [<ffffffff8127c326>] ? aio_run_iocb+0x126/0x2d0 [ 414.649976] [<ffffffff8132f2c2>] blk_flush_plug_list+0x132/0x2e0 [ 414.649979] [<ffffffff812086e5>] ? kmem_cache_alloc+0x165/0x1e0 [ 414.649981] [<ffffffff8127c731>] ? io_submit_one+0x261/0x530 [ 414.649983] [<ffffffff8116c796>] ? trace_event_buffer_commit+0x146/0x1d0 [ 414.649984] [<ffffffff8132f674>] blk_finish_plug+0x34/0x50 [ 414.649986] [<ffffffff8127cb32>] do_io_submit+0x132/0x210 [ 414.649988] [<ffffffff8127cc20>] SyS_io_submit+0x10/0x20 [ 414.649992] [<ffffffff81003e7d>] do_syscall_64+0x7d/0x230 [ 414.649997] [<ffffffff8106f097>] ? do_page_fault+0x37/0x90 [ 414.650001] [<ffffffff816e3f61>] entry_SYSCALL64_slow_path+0x25/0x25 [ 414.650020] Code: 49 89 d8 4c 89 64 24 08 48 8b 43 78 4c 89 f9 4c 89 ea 48 c7 c6 50 11 66 a0 48 c7 c7 38 14 66 a0 48 89 04 24 31 c0 e8 d2 68 b3 e0 <0f> 0b eb fe 80 3d 39 33 00 00 01 0f 84 6d fd ff ff be 8f 05 00 [ 414.650023] RIP [<ffffffffa065eef0>] nvme_rdma_queue_rq+0x2d0/0x360 [nvme_rdma] [ 414.650023] RSP <ffff880fa44ff9a8> crash> gdb list *nvme_rdma_queue_rq+0x2d0 0xffffffffa065eef0 is in nvme_rdma_queue_rq (drivers/nvme/host/rdma.c:1434). 1429 if ((uintptr_t)queue->qp->device == (uintptr_t)(0x6b6b6b6b6b6b6b6bUL) || 1430 (uintptr_t)queue->qp->qp_num == (uintptr_t)(0x6b6b6b6b6b6b6b6bUL) || 1431 (uintptr_t)queue->ib_cq->comp_handler == (uintptr_t)(0x6b6b6b6b6b6b6b6bUL) || 1432 (uintptr_t)queue->ib_cq->device == (uintptr_t)(0x6b6b6b6b6b6b6b6bUL)) { 1433 pr_err("%s BAZINGA! hctx %p ns %p queue %p hctx1 %p hctx2 %p rq %p req %p\n", __func__, hctx, ns, queue, queue->hctx1, queue->hctx2, rq, req); 1434 BUG_ON(1); 1435 } 1436 dev = queue->device->dev; 1437 ib_dma_sync_single_for_cpu(dev, sqe->dma, 1438 sizeof(struct nvme_command), DMA_TO_DEVICE); crash> nvme_rdma_request ffff880fbb0313b0 struct nvme_rdma_request { mr = 0xffff880fbb7d9ad8, sqe = { cqe = { done = 0xffffffffa065d670 <nvme_rdma_send_done> }, data = 0xffff880fbb7dcc88, dma = 67570093192 }, sge = {{ addr = 67570093192, length = 64, lkey = 0 }, { addr = 67902048256, length = 2048, lkey = 0 }}, num_sge = 2, nents = 1, inline_data = true, reg_wr = { wr = { next = 0xffff880fb65bf518, { wr_id = 18446612199876269112, wr_cqe = 0xffff880fbb031438 }, sg_list = 0x0, num_sge = 0, opcode = IB_WR_REG_MR, send_flags = 0, ex = { imm_data = 0, invalidate_rkey = 0 } }, mr = 0xffff880fbb7d9ad8, key = 40554307, access = 7 }, reg_cqe = { done = 0xffffffffa065d6b0 <nvme_rdma_memreg_done> }, queue = 0xffff880fbc144328, sg_table = { sgl = 0xffff880fbb031458, nents = 1, orig_nents = 1 }, first_sgl = 0xffff880fbb031458 } crash> nvme_rdma_queue 0xffff880fbc144328 struct nvme_rdma_queue { rsp_ring = 0xffff880fbe81c548, sig_count = 60 '<', queue_size = 128, cmnd_capsule_len = 4160, ctrl = 0xffff880fc98eca88, device = 0xffff880ff0b304b8, ib_cq = 0xffff880fbcac3508, qp = 0xffff880fbc1874e8, flags = 16, cm_id = 0xffff880fbc181548, cm_error = 0, cm_done = { done = 0, wait = { lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, task_list = { next = 0xffff880fbc144388, prev = 0xffff880fbc144388 } } }, hctx1 = 0xffff880fba47afd8, hctx2 = 0xffff880fbc1164f8 } crash> gdb x/1g 0xffff880fbcac3508 0xffff880fbcac3508: 0x6b6b6b6b6b6b6b6b crash> ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-27 15:11 ` Steve Wise @ 2016-09-27 15:31 ` Steve Wise 0 siblings, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-09-27 15:31 UTC (permalink / raw) > Hey Christoph, > > To apply Bart's series, I needed to use Jens' for-4.9/block branch. But I also > wanted the latest nvme fixes in linux-4.8-rc8, so I rebased Jens' branch onto > rc8, then applied Bart's series (which needed a small tweak to patch 2). On top > of this I have some debug patches that will BUG_ON() if it detects freed RDMA > objects (requires mem debug on so freed memory has the 0x6b6b... stamp). This > code base can be perused at: > > https://github.com/larrystevenwise/nvme-fabrics/commits/block-for-4.9 > > I then tried to reproduce, and still hit a crash. I'm debugging now. > blk_mq_hw_ctx.state is: 2 nvme_ns.queue.queue_flags is: 0x1f07a00 So the hw_ctx is BLK_MQ_S_TAG_ACTIVE. And the nvme_ns.queue request queue doesn't have QUEUE_FLAG_STOPPED set. nvme_rdma_ctrl.ctrl state is RECONNECTING. ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-23 23:57 ` Sagi Grimberg 2016-09-26 15:12 ` 'Christoph Hellwig' @ 2016-09-27 14:07 ` Steve Wise 1 sibling, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-09-27 14:07 UTC (permalink / raw) > Christoph, > > I'm still trying to understand how it is possible to > get to a point where the request queue is stopped while > the hardware context is not... > > The code in rdma.c seems to do the right thing, but somehow > a stray request sneaks in to our submission path when its not > expected to. > > Steve, is the request a normal read/write? or is it something > else triggered from the reconnect flow? It is a normal IO request I think. length 64. 1 sge. Sometimes I see a REG_MR also filled out in the nvme_rdma_request->reg_wr struct. I'm going to try Bart's series now to see if it fixes this issue... Steve. ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-12 20:10 ` Steve Wise [not found] ` <da2e918b-0f18-e032-272d-368c6ec49c62@gri mberg.me> 2016-09-15 9:53 ` Sagi Grimberg @ 2016-09-15 14:00 ` Gabriel Krisman Bertazi 2016-09-15 14:31 ` Steve Wise 2 siblings, 1 reply; 53+ messages in thread From: Gabriel Krisman Bertazi @ 2016-09-15 14:00 UTC (permalink / raw) "Steve Wise" <swise at opengridcomputing.com> writes: > @@ -622,6 +625,7 @@ static void nvme_rdma_stop_and_free_queue(struct > nvme_rdma_queue *queue) > { > if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) > return; > + BUG_ON(!test_bit(BLK_MQ_S_STOPPED, &queue->hctx->state)); > nvme_rdma_stop_queue(queue); > nvme_rdma_free_queue(queue); > } > @@ -1408,6 +1412,8 @@ static int nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx, > > WARN_ON_ONCE(rq->tag < 0); > > + BUG_ON(hctx != queue->hctx); > + BUG_ON(test_bit(BLK_MQ_S_STOPPED, &hctx->state)); > dev = queue->device->dev; > ib_dma_sync_single_for_cpu(dev, sqe->dma, > sizeof(struct nvme_command), DMA_TO_DEVICE); > This reminds me of the discussion I had with Jens a few weeks ago here: http://lists.infradead.org/pipermail/linux-nvme/2016-August/005916.html The BUG_ON I hit is similar to yours, but for nvme over PCI. I think the update queues code will reach a similar path of remapping, but I didnt go out and check yet. Can you check you are running with the patch he mentioned at: http://lists.infradead.org/pipermail/linux-nvme/2016-August/005962.html Thanks, -- Gabriel Krisman Bertazi ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-15 14:00 ` Gabriel Krisman Bertazi @ 2016-09-15 14:31 ` Steve Wise 0 siblings, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-09-15 14:31 UTC (permalink / raw) > > This reminds me of the discussion I had with Jens a few weeks ago here: > > http://lists.infradead.org/pipermail/linux-nvme/2016-August/005916.html > > The BUG_ON I hit is similar to yours, but for nvme over PCI. I think > the update queues code will reach a similar path of remapping, but I > didnt go out and check yet. > > Can you check you are running with the patch he mentioned at: > > http://lists.infradead.org/pipermail/linux-nvme/2016-August/005962.html I don't have this patch, and blk_mq_hctx_cpu_offline() is very different from the code in that patch. I'm using 4.8-rc5 plus these patches from nvmf-4.8-rc and linux-block. This matches Jens' for-linus tag at this point in time. It will all hit -rc6. cbee748 nvme-rdma: add back dependency on CONFIG_BLOCK b8ce92e nvme-rdma: fix null pointer dereference on req->mr c51b3c7 nvme-rdma: use ib_client API to detect device removal 2b24ee4 nvme-rdma: add DELETING queue flag 8c7d713 nvme-rdma: destroy nvme queue rdma resources on connect failure 4604d4e nvme_rdma: keep a ref on the ctrl during delete/flush 372a82d iw_cxgb4: block module unload until all ep resources are released 7053902 iw_cxgb4: call dev_put() on l2t allocation failure 82469c5 nvme: Don't suspend admin queue that wasn't created c693593 Linux 4.8-rc5 ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-04 9:17 ` Sagi Grimberg 2016-09-07 21:08 ` Steve Wise @ 2016-09-07 21:33 ` Steve Wise 2016-09-08 8:22 ` Sagi Grimberg [not found] ` <9fd1f090-3b86-b496-d8c0-225ac0815fbe@grimbe rg.me> 1 sibling, 2 replies; 53+ messages in thread From: Steve Wise @ 2016-09-07 21:33 UTC (permalink / raw) > Now, given that you already verified that the queues are stopped with > BLK_MQ_S_STOPPED, I'm looking at blk-mq now. > > I see that blk_mq_run_hw_queue() and __blk_mq_run_hw_queue() indeed take > BLK_MQ_S_STOPPED into account. Theoretically if we free the queue > pairs after we passed these checks while the rq_list is being processed > then we can end-up with this condition, but given that it takes > essentially forever (10 seconds) I tend to doubt this is the case. > > HCH, Jens, Keith, any useful pointers for us? > > To summarize we see a stray request being queued long after we set > BLK_MQ_S_STOPPED (and by long I mean 10 seconds). Does nvme-rdma need to call blk_mq_queue_reinit() after it reinits the tag set for that queue as part of reconnecting? ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-07 21:33 ` Steve Wise @ 2016-09-08 8:22 ` Sagi Grimberg 2016-09-08 17:19 ` Steve Wise [not found] ` <9fd1f090-3b86-b496-d8c0-225ac0815fbe@grimbe rg.me> 1 sibling, 1 reply; 53+ messages in thread From: Sagi Grimberg @ 2016-09-08 8:22 UTC (permalink / raw) >> Now, given that you already verified that the queues are stopped with >> BLK_MQ_S_STOPPED, I'm looking at blk-mq now. >> >> I see that blk_mq_run_hw_queue() and __blk_mq_run_hw_queue() indeed take >> BLK_MQ_S_STOPPED into account. Theoretically if we free the queue >> pairs after we passed these checks while the rq_list is being processed >> then we can end-up with this condition, but given that it takes >> essentially forever (10 seconds) I tend to doubt this is the case. >> >> HCH, Jens, Keith, any useful pointers for us? >> >> To summarize we see a stray request being queued long after we set >> BLK_MQ_S_STOPPED (and by long I mean 10 seconds). > > Does nvme-rdma need to call blk_mq_queue_reinit() after it reinits the tag set > for that queue as part of reconnecting? I don't see how that'd help... ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-08 8:22 ` Sagi Grimberg @ 2016-09-08 17:19 ` Steve Wise 2016-09-09 15:57 ` Steve Wise 0 siblings, 1 reply; 53+ messages in thread From: Steve Wise @ 2016-09-08 17:19 UTC (permalink / raw) > >> Now, given that you already verified that the queues are stopped with > >> BLK_MQ_S_STOPPED, I'm looking at blk-mq now. > >> > >> I see that blk_mq_run_hw_queue() and __blk_mq_run_hw_queue() indeed take > >> BLK_MQ_S_STOPPED into account. Theoretically if we free the queue > >> pairs after we passed these checks while the rq_list is being processed > >> then we can end-up with this condition, but given that it takes > >> essentially forever (10 seconds) I tend to doubt this is the case. > >> > >> HCH, Jens, Keith, any useful pointers for us? > >> > >> To summarize we see a stray request being queued long after we set > >> BLK_MQ_S_STOPPED (and by long I mean 10 seconds). > > > > Does nvme-rdma need to call blk_mq_queue_reinit() after it reinits the tag set > > for that queue as part of reconnecting? > > I don't see how that'd help... > I can't explain this, but the nvme_rdma_queue.flags field has a bit set that shouldn't be set: crash> nvme_rdma_queue.flags -x ffff880e52b8e7e8 flags = 0x14 Bit 2 is set, NVME_RDMA_Q_DELETING, but bit 4 is also set and should never be... enum nvme_rdma_queue_flags { NVME_RDMA_Q_CONNECTED = (1 << 0), NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1), NVME_RDMA_Q_DELETING = (1 << 2), }; The rest of the structure looks fine. I've also seen crash dumps where bit 3 is set which is also not used. /me confused... ^ permalink raw reply [flat|nested] 53+ messages in thread
* nvmf/rdma host crash during heavy load and keep alive recovery 2016-09-08 17:19 ` Steve Wise @ 2016-09-09 15:57 ` Steve Wise 0 siblings, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-09-09 15:57 UTC (permalink / raw) > > > > I can't explain this, but the nvme_rdma_queue.flags field has a bit set that > shouldn't be set: > > crash> nvme_rdma_queue.flags -x ffff880e52b8e7e8 > flags = 0x14 > > Bit 2 is set, NVME_RDMA_Q_DELETING, but bit 4 is also set and should never be... > > enum nvme_rdma_queue_flags { > NVME_RDMA_Q_CONNECTED = (1 << 0), > NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1), > NVME_RDMA_Q_DELETING = (1 << 2), > }; > > The rest of the structure looks fine. I've also seen crash dumps where bit 3 is > set which is also not used. > > /me confused... I'm dumb: 1<<0 is 1, so CONNECTED is bit 1, QUEUE_ALLOCATED is bit 2, and Q_DELETING is bit 4! Bits 0 and 3 are not used. So 0x14 is bits 4 and 2: DELETING and QUEUE_ALLOCATED. The queue_flags enum should not be using the (1<<X) initialization. Rather, they should be 0, 1, 2, etc... ^ permalink raw reply [flat|nested] 53+ messages in thread
[parent not found: <9fd1f090-3b86-b496-d8c0-225ac0815fbe@grimbe rg.me>]
[parent not found: <01bc01d209f5$1 b7d7510$52785f30$@opengridcomputing.com>]
[parent not found: <01bc01d209f5$1b7d7510$52785f30$@opengridcomputing.com>]
* nvmf/rdma host crash during heavy load and keep alive recovery [not found] ` <01bc01d209f5$1b7d7510$52785f30$@opengridcomputing.com> @ 2016-09-08 19:15 ` Steve Wise 0 siblings, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-09-08 19:15 UTC (permalink / raw) > > >> Now, given that you already verified that the queues are stopped with > > >> BLK_MQ_S_STOPPED, I'm looking at blk-mq now. > > >> > > >> I see that blk_mq_run_hw_queue() and __blk_mq_run_hw_queue() indeed > take > > >> BLK_MQ_S_STOPPED into account. Theoretically if we free the queue > > >> pairs after we passed these checks while the rq_list is being > processed > > >> then we can end-up with this condition, but given that it takes > > >> essentially forever (10 seconds) I tend to doubt this is the case. > > >> > > >> HCH, Jens, Keith, any useful pointers for us? > > >> > > >> To summarize we see a stray request being queued long after we set > > >> BLK_MQ_S_STOPPED (and by long I mean 10 seconds). > > > > > > Does nvme-rdma need to call blk_mq_queue_reinit() after it reinits the > tag set > > > for that queue as part of reconnecting? > > > > I don't see how that'd help... > > > > I can't explain this, but the nvme_rdma_queue.flags field has a bit set > that shouldn't be set: > > crash> nvme_rdma_queue.flags -x ffff880e52b8e7e8 > flags = 0x14 > > Bit 2 is set, NVME_RDMA_Q_DELETING, but bit 4 is also set and should never > be... > > enum nvme_rdma_queue_flags { > NVME_RDMA_Q_CONNECTED = (1 << 0), > NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1), > NVME_RDMA_Q_DELETING = (1 << 2), > }; > > The rest of the structure looks fine. I've also seen crash dumps where > bit 3 is set which is also not used. > > /me confused... > While working this with debug code to verify that we never create a qp, cq, or cm_id where one already exists for an nvme_rdma_queue, I discovered a bug where the Q_DELETING flag is never cleared, and thus a reconnect can leak qps and cm_ids. The fix, I think, is this: @@ -563,6 +572,7 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl, int ret; queue = &ctrl->queues[idx]; + queue->flags = 0; queue->ctrl = ctrl; init_completion(&queue->cm_done); I think maybe the clearing of the Q_DELETING flag was lost when we moved to using the ib_client for device removal. I'll polish this up and submit a patch. It should go with the next 4.8-rc push I think. This doesn't resolve the original failure I'm chasing in this thread though :( Steve. ^ permalink raw reply [flat|nested] 53+ messages in thread
[parent not found: <01f201d20a05$6abde5f0$4039b1d0$@opengridcomputing.com>]
* nvmf/rdma host crash during heavy load and keep alive recovery [not found] ` <01f201d20a05$6abde5f0$4039b1d0$@opengridcomputing.com> @ 2016-09-08 19:26 ` Steve Wise [not found] ` <01f401d20a06$d4cc8360$7e658a20$@opengridcomputing.com> 1 sibling, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-09-08 19:26 UTC (permalink / raw) > While working this with debug code to verify that we never create a qp, > cq, or cm_id where one already exists for an nvme_rdma_queue, I discovered > a bug where the Q_DELETING flag is never cleared, and thus a reconnect can > leak qps and cm_ids. The fix, I think, is this: > > @@ -563,6 +572,7 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl > *ctrl, > int ret; > > queue = &ctrl->queues[idx]; > + queue->flags = 0; > queue->ctrl = ctrl; > init_completion(&queue->cm_done); > > I think maybe the clearing of the Q_DELETING flag was lost when we moved > to using the ib_client for device removal. I'll polish this up and > submit a patch. It should go with the next 4.8-rc push I think. Actually, I think the Q_DELETING flag is no longer needed. Sagi, can you have look at NVME_RDMA_Q_DELETING in the latest code? I think the ib_client patch made the original Q_DELETING patch obsolete. And the original Q_DELETING patch probably needed the above chunk for correctness... Let me know if you want me to submit something for this issue. We could fix the original patches as they are still only in your nvmf-4.8-rc repo... Steve. ^ permalink raw reply [flat|nested] 53+ messages in thread
[parent not found: <01f401d20a06$d4cc8360$7e658a20$@opengridcomputing.com>]
* nvmf/rdma host crash during heavy load and keep alive recovery [not found] ` <01f401d20a06$d4cc8360$7e658a20$@opengridcomputing.com> @ 2016-09-08 20:44 ` Steve Wise 0 siblings, 0 replies; 53+ messages in thread From: Steve Wise @ 2016-09-08 20:44 UTC (permalink / raw) > > While working this with debug code to verify that we never create a qp, > > cq, or cm_id where one already exists for an nvme_rdma_queue, I > discovered > > a bug where the Q_DELETING flag is never cleared, and thus a reconnect > can > > leak qps and cm_ids. The fix, I think, is this: > > > > @@ -563,6 +572,7 @@ static int nvme_rdma_init_queue(struct > nvme_rdma_ctrl > > *ctrl, > > int ret; > > > > queue = &ctrl->queues[idx]; > > + queue->flags = 0; > > queue->ctrl = ctrl; > > init_completion(&queue->cm_done); > > > > I think maybe the clearing of the Q_DELETING flag was lost when we moved > > to using the ib_client for device removal. I'll polish this up and > > submit a patch. It should go with the next 4.8-rc push I think. > > Actually, I think the Q_DELETING flag is no longer needed. Sagi, can you > have look at NVME_RDMA_Q_DELETING in the latest code? I think the > ib_client patch made the original Q_DELETING patch obsolete. And the > original Q_DELETING patch probably needed the above chunk for > correctness... > > Let me know if you want me to submit something for this issue. We could > fix the original patches as they are still only in your nvmf-4.8-rc > repo... I see your debug patch v2 you sent me in an earlier email has a clear_bit() to address the DELETING issue. I haven't tried that patch yet. :) Its next on my list... steve ^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2016-09-27 15:31 UTC | newest] Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-07-29 21:40 nvmf/rdma host crash during heavy load and keep alive recovery Steve Wise 2016-08-01 11:06 ` Christoph Hellwig 2016-08-01 14:26 ` Steve Wise 2016-08-01 21:38 ` Steve Wise [not found] ` <015801d1ec3d$0ca07ea0$25e17be0$@opengridcomputing.com> 2016-08-10 15:46 ` Steve Wise [not found] ` <010f01d1f31e$50c8cb40$f25a61c0$@opengridcomputing.com> 2016-08-10 16:00 ` Steve Wise [not found] ` <013701d1f320$57b185d0$07149170$@opengridcomputing.com> 2016-08-10 17:20 ` Steve Wise 2016-08-10 18:59 ` Steve Wise 2016-08-11 6:27 ` Sagi Grimberg 2016-08-11 13:58 ` Steve Wise 2016-08-11 14:19 ` Steve Wise 2016-08-11 14:40 ` Steve Wise 2016-08-11 15:53 ` Steve Wise [not found] ` <00fe01d1f3e8$8992b330$9cb81990$@opengridcomputing.com> 2016-08-15 14:39 ` Steve Wise 2016-08-16 9:26 ` Sagi Grimberg 2016-08-16 21:17 ` Steve Wise 2016-08-17 18:57 ` Sagi Grimberg 2016-08-17 19:07 ` Steve Wise 2016-09-01 19:14 ` Steve Wise 2016-09-04 9:17 ` Sagi Grimberg 2016-09-07 21:08 ` Steve Wise 2016-09-08 7:45 ` Sagi Grimberg 2016-09-08 20:47 ` Steve Wise 2016-09-08 21:00 ` Steve Wise [not found] ` <7f09e373-6316-26a3-ae81-dab1205d88ab@grimbe rg.me> [not found] ` <021201d20a14$0 f203b80$2d60b280$@opengridcomputing.com> [not found] ` <021201d20a14$0f203b80$2d60b280$@opengridcomputing.com> 2016-09-08 21:21 ` Steve Wise [not found] ` <021401d20a16$ed60d470$c8227d50$@opengridcomputing.com> [not found] ` <021501d20a19$327ba5b0$9772f110$@opengrid computing.com> 2016-09-08 21:37 ` Steve Wise 2016-09-09 15:50 ` Steve Wise 2016-09-12 20:10 ` Steve Wise [not found] ` <da2e918b-0f18-e032-272d-368c6ec49c62@gri mberg.me> 2016-09-15 9:53 ` Sagi Grimberg 2016-09-15 14:44 ` Steve Wise 2016-09-15 15:10 ` Steve Wise 2016-09-15 15:53 ` Steve Wise 2016-09-15 16:45 ` Steve Wise 2016-09-15 20:58 ` Steve Wise 2016-09-16 11:04 ` 'Christoph Hellwig' 2016-09-18 17:02 ` Sagi Grimberg 2016-09-19 15:38 ` Steve Wise 2016-09-21 21:20 ` Steve Wise 2016-09-23 23:57 ` Sagi Grimberg 2016-09-26 15:12 ` 'Christoph Hellwig' 2016-09-26 22:29 ` 'Christoph Hellwig' 2016-09-27 15:11 ` Steve Wise 2016-09-27 15:31 ` Steve Wise 2016-09-27 14:07 ` Steve Wise 2016-09-15 14:00 ` Gabriel Krisman Bertazi 2016-09-15 14:31 ` Steve Wise 2016-09-07 21:33 ` Steve Wise 2016-09-08 8:22 ` Sagi Grimberg 2016-09-08 17:19 ` Steve Wise 2016-09-09 15:57 ` Steve Wise [not found] ` <9fd1f090-3b86-b496-d8c0-225ac0815fbe@grimbe rg.me> [not found] ` <01bc01d209f5$1 b7d7510$52785f30$@opengridcomputing.com> [not found] ` <01bc01d209f5$1b7d7510$52785f30$@opengridcomputing.com> 2016-09-08 19:15 ` Steve Wise [not found] ` <01f201d20a05$6abde5f0$4039b1d0$@opengridcomputing.com> 2016-09-08 19:26 ` Steve Wise [not found] ` <01f401d20a06$d4cc8360$7e658a20$@opengridcomputing.com> 2016-09-08 20:44 ` Steve Wise
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.