From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752458AbcFNQWZ (ORCPT ); Tue, 14 Jun 2016 12:22:25 -0400 Received: from smtp.opengridcomputing.com ([72.48.136.20]:41709 "EHLO smtp.opengridcomputing.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751392AbcFNQWX (ORCPT ); Tue, 14 Jun 2016 12:22:23 -0400 From: "Steve Wise" To: "'Sagi Grimberg'" , "'Christoph Hellwig'" , , , Cc: , , , "'Armen Baloyan'" , "'Jay Freyensee'" , "'Ming Lin'" , References: <1465248215-18186-1-git-send-email-hch@lst.de> <1465248215-18186-5-git-send-email-hch@lst.de> <5756B75C.9000409@lightbits.io> <057a01d1c2a3$3082eec0$9188cc40$@opengridcomputing.com> <00f501d1c657$483a74e0$d8af5ea0$@opengridcomputing.com> In-Reply-To: <00f501d1c657$483a74e0$d8af5ea0$@opengridcomputing.com> Subject: RE: [PATCH 4/5] nvmet-rdma: add a NVMe over Fabrics RDMA target driver Date: Tue, 14 Jun 2016 11:22:25 -0500 Message-ID: <00f701d1c658$efd720d0$cf856270$@opengridcomputing.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQHsnHT7Ltc2grgyk+cl8PWzIpbndQLyURyqAT0isz0BMxfF5wExALZ4n37bpvA= Content-Language: en-us Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > Hey Sean, > > Am I correct here? IE: Is it ok for the rdma application to rdma_reject() and > rmda_destroy_id() the CONNECT_REQUEST cm_id _inside_ its event handler as > long > as it returns 0? > > Thanks, > > Steve. Looking at rdma_destroy_id(), I think it is invalid to call it from the event handler: void rdma_destroy_id(struct rdma_cm_id *id) { /* * Wait for any active callback to finish. New callbacks will find * the id_priv state set to destroying and abort. */ mutex_lock(&id_priv->handler_mutex); mutex_unlock(&id_priv->handler_mutex); And indeed when I tried to destroy the CONNECT request cm_id in the nvmet event handler, I see the event handler thread is stuck: INFO: task kworker/u32:0:6275 blocked for more than 120 seconds. Tainted: G E 4.7.0-rc2-nvmf-all.3+ #81 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u32:0 D ffff880f90737768 0 6275 2 0x10000080 Workqueue: iw_cm_wq cm_work_handler [iw_cm] ffff880f90737768 ffff880f907376d8 ffffffff81c0b500 0000000000000005 ffff8810226a4940 ffff88102b894490 ffffffffa02cf4cd ffff880f00000000 ffff880fcd917c00 ffff880f00000000 0000000000000004 ffff880f00000000 Call Trace: [] ? stop_ep_timer+0x2d/0xe0 [iw_cxgb4] [] schedule+0x47/0xc0 [] ? iw_cm_reject+0x96/0xe0 [iw_cm] [] schedule_preempt_disabled+0x15/0x20 [] __mutex_lock_slowpath+0x108/0x310 [] mutex_lock+0x31/0x50 [] rdma_destroy_id+0x38/0x200 [rdma_cm] [] ? nvmet_rdma_queue_connect+0x1a0/0x1a0 [nvmet_rdma] [] ? rdma_create_id+0x171/0x1a0 [rdma_cm] [] nvmet_rdma_cm_handler+0x108/0x168 [nvmet_rdma] [] iw_conn_req_handler+0x1ca/0x240 [rdma_cm] [] cm_conn_req_handler+0x606/0x680 [iw_cm] [] process_event+0xc9/0xf0 [iw_cm] [] cm_work_handler+0x147/0x1c0 [iw_cm] [] ? trace_event_raw_event_workqueue_execute_start+0x66/0xa0 [] process_one_work+0x1c6/0x550 ... So I withdraw my comment about nvmet. I think the code is fine as-is. The 2nd reject results in a no-op since the connection request was rejected by nvmet. Steve.