From: Hou Pu <houpu@bytedance.com> To: Michael Christie <michael.christie@oracle.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, linux-scsi@vger.kernel.org, target-devel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH] iscsi-target: fix hang in iscsit_access_np() when getting tpg->np_login_sem Date: Wed, 02 Sep 2020 04:16:03 +0000 [thread overview] Message-ID: <e655c868-966d-1846-6bd8-19671cf966d4@bytedance.com> (raw) In-Reply-To: <24875CC6-70FA-477D-BB74-51FBFDD96732@oracle.com> On 2020/9/2 10:57 AM, Michael Christie wrote: > > >> On Jul 29, 2020, at 8:03 AM, Hou Pu <houpu@bytedance.com> wrote: >> >> The iscsi target login thread might stuck in following stack: >> >> cat /proc/`pidof iscsi_np`/stack >> [<0>] down_interruptible+0x42/0x50 >> [<0>] iscsit_access_np+0xe3/0x167 >> [<0>] iscsi_target_locate_portal+0x695/0x8ac >> [<0>] __iscsi_target_login_thread+0x855/0xb82 >> [<0>] iscsi_target_login_thread+0x2f/0x5a >> [<0>] kthread+0xfa/0x130 >> [<0>] ret_from_fork+0x1f/0x30 >> >> This could be reproduced by following steps: >> 1. Initiator A try to login iqn1-tpg1 on port 3260. After finishing >> PDU exchange in the login thread and before the negotiation is >> finished, at this time the network link is down. In a production >> environment, this could happen. I could emulated it by bring >> the network card down in the initiator node by ifconfig eth0 down. >> (Now A could never finish this login. And tpg->np_login_sem is >> hold by it). >> 2. Initiator B try to login iqn2-tpg1 on port 3260. After finishing >> PDU exchange in the login thread. The target expect to process >> remaining login PDUs in workqueue context. >> 3. Initiator A' try to re-login to iqn1-tpg1 on port 3260 from >> a new socket. It will wait for tpg->np_login_sem with >> np->np_login_timer loaded to wait for at most 15 second. >> (Because the lock is held by A. A never gets a change to >> release tpg->np_login_sem. so A' should finally get timeout). >> 4. Before A' got timeout. Initiator B gets negotiation failed and >> calls iscsi_target_login_drop()->iscsi_target_login_sess_out(). >> The np->np_login_timer is canceled. And initiator A' will hang >> there forever. Because A' is now in the login thread. All other >> login requests could not be serviced. > > iqn1 and iqn1 are different targets right? It’s not clear to me how when initiator B fails negotiation that it cancels the timer for the portal under a different iqn/target. iqn1-tpg1 in step1 and step3 are same one. (same target volume) iqn2-tpg1 in step2 is a different volume on the same host. The configuration likes below: iqn1-tpg1: root@storageXXX:/sys/kernel/config/target/iscsi# ls iqn.2010-10.org.openstack\:volume-00e50deb-5296-4f18-xxxx-106f96a880c8/tpgt_1/np/ 10.129.77.16:3260 iqn2-tpg1: root@storageXXX:/sys/kernel/config/target/iscsi# ls iqn.2010-10.org.openstack\:volume-86af15c6-c529-4715-xxxx-3c9ca068635d/tpgt_1/np/ 10.129.77.16:3260 (I could provide more is needed) > > Is iqn2-tpg1->np1 a different struct than iqn1-tpg1-np1? I mean iscsit_get_tpg_from_np would return a different np struct for initiator B and for A? > iscsit_get_tpg_from_np() returned different struct iscsi_portal_group for initiator A and B. But struct iscsi_np is shared by them. Because they have the same portal(ip address and port). Thanks, Hou
WARNING: multiple messages have this Message-ID (diff)
From: Hou Pu <houpu@bytedance.com> To: Michael Christie <michael.christie@oracle.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, linux-scsi@vger.kernel.org, target-devel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH] iscsi-target: fix hang in iscsit_access_np() when getting tpg->np_login_sem Date: Wed, 2 Sep 2020 12:16:03 +0800 [thread overview] Message-ID: <e655c868-966d-1846-6bd8-19671cf966d4@bytedance.com> (raw) In-Reply-To: <24875CC6-70FA-477D-BB74-51FBFDD96732@oracle.com> On 2020/9/2 10:57 AM, Michael Christie wrote: > > >> On Jul 29, 2020, at 8:03 AM, Hou Pu <houpu@bytedance.com> wrote: >> >> The iscsi target login thread might stuck in following stack: >> >> cat /proc/`pidof iscsi_np`/stack >> [<0>] down_interruptible+0x42/0x50 >> [<0>] iscsit_access_np+0xe3/0x167 >> [<0>] iscsi_target_locate_portal+0x695/0x8ac >> [<0>] __iscsi_target_login_thread+0x855/0xb82 >> [<0>] iscsi_target_login_thread+0x2f/0x5a >> [<0>] kthread+0xfa/0x130 >> [<0>] ret_from_fork+0x1f/0x30 >> >> This could be reproduced by following steps: >> 1. Initiator A try to login iqn1-tpg1 on port 3260. After finishing >> PDU exchange in the login thread and before the negotiation is >> finished, at this time the network link is down. In a production >> environment, this could happen. I could emulated it by bring >> the network card down in the initiator node by ifconfig eth0 down. >> (Now A could never finish this login. And tpg->np_login_sem is >> hold by it). >> 2. Initiator B try to login iqn2-tpg1 on port 3260. After finishing >> PDU exchange in the login thread. The target expect to process >> remaining login PDUs in workqueue context. >> 3. Initiator A' try to re-login to iqn1-tpg1 on port 3260 from >> a new socket. It will wait for tpg->np_login_sem with >> np->np_login_timer loaded to wait for at most 15 second. >> (Because the lock is held by A. A never gets a change to >> release tpg->np_login_sem. so A' should finally get timeout). >> 4. Before A' got timeout. Initiator B gets negotiation failed and >> calls iscsi_target_login_drop()->iscsi_target_login_sess_out(). >> The np->np_login_timer is canceled. And initiator A' will hang >> there forever. Because A' is now in the login thread. All other >> login requests could not be serviced. > > iqn1 and iqn1 are different targets right? It’s not clear to me how when initiator B fails negotiation that it cancels the timer for the portal under a different iqn/target. iqn1-tpg1 in step1 and step3 are same one. (same target volume) iqn2-tpg1 in step2 is a different volume on the same host. The configuration likes below: iqn1-tpg1: root@storageXXX:/sys/kernel/config/target/iscsi# ls iqn.2010-10.org.openstack\:volume-00e50deb-5296-4f18-xxxx-106f96a880c8/tpgt_1/np/ 10.129.77.16:3260 iqn2-tpg1: root@storageXXX:/sys/kernel/config/target/iscsi# ls iqn.2010-10.org.openstack\:volume-86af15c6-c529-4715-xxxx-3c9ca068635d/tpgt_1/np/ 10.129.77.16:3260 (I could provide more is needed) > > Is iqn2-tpg1->np1 a different struct than iqn1-tpg1-np1? I mean iscsit_get_tpg_from_np would return a different np struct for initiator B and for A? > iscsit_get_tpg_from_np() returned different struct iscsi_portal_group for initiator A and B. But struct iscsi_np is shared by them. Because they have the same portal(ip address and port). Thanks, Hou
next prev parent reply other threads:[~2020-09-02 4:16 UTC|newest] Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-07-29 13:03 [PATCH] iscsi-target: fix hang in iscsit_access_np() when getting tpg->np_login_sem Hou Pu 2020-07-29 13:03 ` Hou Pu 2020-08-31 12:02 ` Hou Pu 2020-08-31 12:02 ` Hou Pu 2020-09-02 2:57 ` Michael Christie 2020-09-02 2:57 ` Michael Christie 2020-09-02 4:16 ` Hou Pu [this message] 2020-09-02 4:16 ` Hou Pu 2020-09-02 19:02 ` Mike Christie 2020-09-02 19:02 ` Mike Christie 2020-09-03 3:00 ` Martin K. Petersen 2020-09-03 3:00 ` Martin K. Petersen
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=e655c868-966d-1846-6bd8-19671cf966d4@bytedance.com \ --to=houpu@bytedance.com \ --cc=linux-scsi@vger.kernel.org \ --cc=martin.petersen@oracle.com \ --cc=michael.christie@oracle.com \ --cc=stable@vger.kernel.org \ --cc=target-devel@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.