All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hou Pu <houpu@bytedance.com>
To: Michael Christie <michael.christie@oracle.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org, target-devel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH] iscsi-target: fix hang in iscsit_access_np() when getting tpg->np_login_sem
Date: Wed, 02 Sep 2020 04:16:03 +0000	[thread overview]
Message-ID: <e655c868-966d-1846-6bd8-19671cf966d4@bytedance.com> (raw)
In-Reply-To: <24875CC6-70FA-477D-BB74-51FBFDD96732@oracle.com>



On 2020/9/2 10:57 AM, Michael Christie wrote:
> 
> 
>> On Jul 29, 2020, at 8:03 AM, Hou Pu <houpu@bytedance.com> wrote:
>>
>> The iscsi target login thread might stuck in following stack:
>>
>> cat /proc/`pidof iscsi_np`/stack
>> [<0>] down_interruptible+0x42/0x50
>> [<0>] iscsit_access_np+0xe3/0x167
>> [<0>] iscsi_target_locate_portal+0x695/0x8ac
>> [<0>] __iscsi_target_login_thread+0x855/0xb82
>> [<0>] iscsi_target_login_thread+0x2f/0x5a
>> [<0>] kthread+0xfa/0x130
>> [<0>] ret_from_fork+0x1f/0x30
>>
>> This could be reproduced by following steps:
>> 1. Initiator A try to login iqn1-tpg1 on port 3260. After finishing
>>    PDU exchange in the login thread and before the negotiation is
>>    finished, at this time the network link is down. In a production
>>    environment, this could happen. I could emulated it by bring
>>    the network card down in the initiator node by ifconfig eth0 down.
>>    (Now A could never finish this login. And tpg->np_login_sem is
>>    hold by it).
>> 2. Initiator B try to login iqn2-tpg1 on port 3260. After finishing
>>    PDU exchange in the login thread. The target expect to process
>>    remaining login PDUs in workqueue context.
>> 3. Initiator A' try to re-login to iqn1-tpg1 on port 3260 from
>>    a new socket. It will wait for tpg->np_login_sem with
>>    np->np_login_timer loaded to wait for at most 15 second.
>>    (Because the lock is held by A. A never gets a change to
>>    release tpg->np_login_sem. so A' should finally get timeout).
>> 4. Before A' got timeout. Initiator B gets negotiation failed and
>>    calls iscsi_target_login_drop()->iscsi_target_login_sess_out().
>>    The np->np_login_timer is canceled. And initiator A' will hang
>>    there forever. Because A' is now in the login thread. All other
>>    login requests could not be serviced.
> 
> iqn1 and iqn1 are different targets right? It’s not clear to me how when initiator B fails negotiation that it cancels the timer for the portal under a different iqn/target.

iqn1-tpg1 in step1 and step3 are same one. (same target volume)
iqn2-tpg1 in step2 is a different volume on the same host.
The configuration likes below:

iqn1-tpg1:
root@storageXXX:/sys/kernel/config/target/iscsi# ls 
iqn.2010-10.org.openstack\:volume-00e50deb-5296-4f18-xxxx-106f96a880c8/tpgt_1/np/
10.129.77.16:3260

iqn2-tpg1:
root@storageXXX:/sys/kernel/config/target/iscsi# ls 
iqn.2010-10.org.openstack\:volume-86af15c6-c529-4715-xxxx-3c9ca068635d/tpgt_1/np/
10.129.77.16:3260

(I could provide more is needed)

> 
> Is iqn2-tpg1->np1 a different struct than iqn1-tpg1-np1? I mean iscsit_get_tpg_from_np would return a different np struct for initiator B and for A?
> 

iscsit_get_tpg_from_np() returned different struct iscsi_portal_group
for initiator A and B. But struct iscsi_np is shared by them.
Because they have the same portal(ip address and port).


Thanks,
Hou










WARNING: multiple messages have this Message-ID (diff)
From: Hou Pu <houpu@bytedance.com>
To: Michael Christie <michael.christie@oracle.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org, target-devel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH] iscsi-target: fix hang in iscsit_access_np() when getting tpg->np_login_sem
Date: Wed, 2 Sep 2020 12:16:03 +0800	[thread overview]
Message-ID: <e655c868-966d-1846-6bd8-19671cf966d4@bytedance.com> (raw)
In-Reply-To: <24875CC6-70FA-477D-BB74-51FBFDD96732@oracle.com>



On 2020/9/2 10:57 AM, Michael Christie wrote:
> 
> 
>> On Jul 29, 2020, at 8:03 AM, Hou Pu <houpu@bytedance.com> wrote:
>>
>> The iscsi target login thread might stuck in following stack:
>>
>> cat /proc/`pidof iscsi_np`/stack
>> [<0>] down_interruptible+0x42/0x50
>> [<0>] iscsit_access_np+0xe3/0x167
>> [<0>] iscsi_target_locate_portal+0x695/0x8ac
>> [<0>] __iscsi_target_login_thread+0x855/0xb82
>> [<0>] iscsi_target_login_thread+0x2f/0x5a
>> [<0>] kthread+0xfa/0x130
>> [<0>] ret_from_fork+0x1f/0x30
>>
>> This could be reproduced by following steps:
>> 1. Initiator A try to login iqn1-tpg1 on port 3260. After finishing
>>    PDU exchange in the login thread and before the negotiation is
>>    finished, at this time the network link is down. In a production
>>    environment, this could happen. I could emulated it by bring
>>    the network card down in the initiator node by ifconfig eth0 down.
>>    (Now A could never finish this login. And tpg->np_login_sem is
>>    hold by it).
>> 2. Initiator B try to login iqn2-tpg1 on port 3260. After finishing
>>    PDU exchange in the login thread. The target expect to process
>>    remaining login PDUs in workqueue context.
>> 3. Initiator A' try to re-login to iqn1-tpg1 on port 3260 from
>>    a new socket. It will wait for tpg->np_login_sem with
>>    np->np_login_timer loaded to wait for at most 15 second.
>>    (Because the lock is held by A. A never gets a change to
>>    release tpg->np_login_sem. so A' should finally get timeout).
>> 4. Before A' got timeout. Initiator B gets negotiation failed and
>>    calls iscsi_target_login_drop()->iscsi_target_login_sess_out().
>>    The np->np_login_timer is canceled. And initiator A' will hang
>>    there forever. Because A' is now in the login thread. All other
>>    login requests could not be serviced.
> 
> iqn1 and iqn1 are different targets right? It’s not clear to me how when initiator B fails negotiation that it cancels the timer for the portal under a different iqn/target.

iqn1-tpg1 in step1 and step3 are same one. (same target volume)
iqn2-tpg1 in step2 is a different volume on the same host.
The configuration likes below:

iqn1-tpg1:
root@storageXXX:/sys/kernel/config/target/iscsi# ls 
iqn.2010-10.org.openstack\:volume-00e50deb-5296-4f18-xxxx-106f96a880c8/tpgt_1/np/
10.129.77.16:3260

iqn2-tpg1:
root@storageXXX:/sys/kernel/config/target/iscsi# ls 
iqn.2010-10.org.openstack\:volume-86af15c6-c529-4715-xxxx-3c9ca068635d/tpgt_1/np/
10.129.77.16:3260

(I could provide more is needed)

> 
> Is iqn2-tpg1->np1 a different struct than iqn1-tpg1-np1? I mean iscsit_get_tpg_from_np would return a different np struct for initiator B and for A?
> 

iscsit_get_tpg_from_np() returned different struct iscsi_portal_group
for initiator A and B. But struct iscsi_np is shared by them.
Because they have the same portal(ip address and port).


Thanks,
Hou











  reply	other threads:[~2020-09-02  4:16 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-29 13:03 [PATCH] iscsi-target: fix hang in iscsit_access_np() when getting tpg->np_login_sem Hou Pu
2020-07-29 13:03 ` Hou Pu
2020-08-31 12:02 ` Hou Pu
2020-08-31 12:02   ` Hou Pu
2020-09-02  2:57 ` Michael Christie
2020-09-02  2:57   ` Michael Christie
2020-09-02  4:16   ` Hou Pu [this message]
2020-09-02  4:16     ` Hou Pu
2020-09-02 19:02 ` Mike Christie
2020-09-02 19:02   ` Mike Christie
2020-09-03  3:00 ` Martin K. Petersen
2020-09-03  3:00   ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e655c868-966d-1846-6bd8-19671cf966d4@bytedance.com \
    --to=houpu@bytedance.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=michael.christie@oracle.com \
    --cc=stable@vger.kernel.org \
    --cc=target-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.