target-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Christie <michael.christie@oracle.com>
To: Dmitry Bogdanov <d.bogdanov@yadro.com>
Cc: mlombard@redhat.com, martin.petersen@oracle.com,
	mgurtovoy@nvidia.com, sagi@grimberg.me,
	linux-scsi@vger.kernel.org, target-devel@vger.kernel.org
Subject: Re: [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling
Date: Wed, 15 Mar 2023 16:42:19 -0500	[thread overview]
Message-ID: <5a1c2ac4-d7cd-b7fe-cc74-7e58e8fca968@oracle.com> (raw)
In-Reply-To: <20230315191141.GF1031@yadro.com>

On 3/15/23 2:11 PM, Dmitry Bogdanov wrote:
> On Wed, Mar 15, 2023 at 11:44:48AM -0500, Mike Christie wrote:
>>
>> On 3/15/23 11:13 AM, Dmitry Bogdanov wrote:
>>> On Thu, Mar 09, 2023 at 04:33:07PM -0600, Mike Christie wrote:
>>>>
>>>> This fixes a bug where an initiator thinks a LUN_RESET has cleaned
>>>> up running commands when it hasn't. The bug was added in:
>>>>
>>>> commit 51ec502a3266 ("target: Delete tmr from list before processing")
>>>>
>>>> The problem occurs when:
>>>>
>>>> 1. We have N IO cmds running in the target layer spread over 2 sessions.
>>>> 2. The initiator sends a LUN_RESET for each session.
>>>> 3. session1's LUN_RESET loops over all the running commands from both
>>>> sessions and moves them to its local drain_task_list.
>>>> 4. session2's LUN_RESET does not see the LUN_RESET from session1 because
>>>> the commit above has it remove itself. session2 also does not see any
>>>> commands since the other reset moved them off the state lists.
>>>> 5. sessions2's LUN_RESET will then complete with a successful response.
>>>> 6. sessions2's inititor believes the running commands on its session are
>>>> now cleaned up due to the successful response and cleans up the running
>>>> commands from its side. It then restarts them.
>>>> 7. The commands do eventually complete on the backend and the target
>>>> starts to return aborted task statuses for them. The initiator will
>>>> either throw a invalid ITT error or might accidentally lookup a new task
>>>> if the ITT has been reallocated already.
>>>>
>>>> This fixes the bug by reverting the patch, and also serializes the
>>>> execution of LUN_RESETs and Preempt and Aborts. The latter is necessary
>>>> because it turns out the commit accidentally fixed a bug where if there
>>>> are 2 LUN RESETs executing they can see each other on the dev_tmr_list,
>>>> put the other one on their local drain list, then end up waiting on each
>>>> other resulting in a deadlock.
>>>
>>> If LUN_RESET is not in TMR list anymore there is no need to serialize
>>> core_tmr_drain_tmr_list.
>>
>> Ah shoot yeah I miswrote that. I meant I needed the serialization for my
>> bug not yours.
> 
> I still did not get why you wrapping core_tmr_drain_*_list by mutex.
> general_tmr_list have only aborts now and they do not wait for other aborts.

Do you mean I don't need the mutex for the bug I originally hit that's described
at the beginning? If your saying I don't need it for 2 resets running at the same
time, I agree. I thought I needed it if we have a RESET and Preempt and Abort:

1. You have 2 sessions. There are no TMRs initially.
2. session1 gets Preempt and Abort. It calls core_tmr_drain_state_list
and takes all the cmds from both sessions and puts them on the local
drain_task_list list.
3. session1 or 2 gets a LUN_RESET, it sees no cmds on the device's
state_lists, and returns success.
4. The initiator thinks the commands were cleaned up by the LUN_RESET.

- It could end up re-using the ITT while the original task being cleaned up is
still running. Then depending on which session got what and if TAS was set, if
the original command completes first then the initiator would think the second
command failed with SAM_STAT_TASK_ABORTED.

- If there was no TAS or the RESET and Preempt and Abort were on the same session
then when we could still hit a bug. We get the RESET response, the initiator might
retry the cmds or fail and the app might retry. The retry might go down a completely
different path on the target (like if hw queue1 was blocked and had the original
command, but this retry goes down hw queue2 due to being received on a different
CPU, so it completes right away). We do some new IO. Then hw queue1 unblocks and
overwrites the new IO.

With the mutex, the LUN_RESET will wait for the Preempt and Abort
which is waiting on the running commands. I could have had Preempt
and Abort create a tmr, and queue a work and go through that path
but I thought it looked uglier faking it.


> 
>>
>>>>
>>>>         if (cmd->transport_state & CMD_T_ABORTED)
>>>> @@ -3596,6 +3597,22 @@ static void target_tmr_work(struct work_struct *work)
>>>>                         target_dev_ua_allocate(dev, 0x29,
>>>>                                                ASCQ_29H_BUS_DEVICE_RESET_FUNCTION_OCCURRED);
>>>>                 }
>>>> +
>>>> +               /*
>>>> +                * If this is the last reset the device can be freed after we
>>>> +                * run transport_cmd_check_stop_to_fabric. Figure out if there
>>>> +                * are other resets that need to be scheduled while we know we
>>>> +                * have a refcount on the device.
>>>> +                */
>>>> +               spin_lock_irq(&dev->se_tmr_lock);
>>>
>>> tmr->tmr_list is removed from the list in the very end of se_cmd lifecycle
>>> so any number of LUN_RESETs can be in lun_reset_tmr_list. And all of them
>>> can be finished but not yet removed from the list.
>>
>> Don't we remove it from the list a little later in this function when
>> we call transport_lun_remove_cmd?
> 
> OMG, yes, of course, you a right. I messed up something.
> 
> But I have concerns still:
> transport_lookup_tmr_lun (where LUN_RESET is added to the list) and
> transport_generic_handle_tmr(where LUN_RESET is scheduled to handle)
> are not serialized. And below you can start the second LUN_RESET while
> transport_generic_handle_tmr is not yet called for it. The _handle_tmr
> could be delayed for a such time that is enough to handle that second
> LUN_RESET and to clear the flag. _handle_tmr will then start the work
> again.

Ah yeah, nice catch.

> 
> Is it worth doing that list management? Is it not enough just wrap
> calling core_tmr_lun_reset() in target_tmr_work by a mutex?

I can just do the mutex.

Was trying to reduce how many threads we use, but the big win is for aborts.
Will work on that type of thing in a separate patchset.


> Better to have a separarte variable used only under lock.
>
Will fix.


  reply	other threads:[~2023-03-15 21:42 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
2023-03-09 22:32 ` [PATCH 01/18] scsi: target: Move sess cmd counter to new struct Mike Christie
2023-03-09 22:32 ` [PATCH 02/18] scsi: target: Move cmd counter allocation Mike Christie
2023-03-09 22:32 ` [PATCH 03/18] scsi: target: Pass in cmd counter to use during cmd setup Mike Christie
2023-03-09 22:32 ` [PATCH 04/18] scsi: target: iscsit/isert: Alloc per conn cmd counter Mike Christie
2023-03-09 22:32 ` [PATCH 05/18] scsi: target: iscsit: stop/wait on cmds during conn close Mike Christie
2023-03-09 22:33 ` [PATCH 06/18] scsi: target: Drop t_state_lock use in compare_and_write_post Mike Christie
2023-03-09 22:33 ` [PATCH 07/18] scsi: target: Treat CMD_T_FABRIC_STOP like CMD_T_STOP Mike Christie
2023-03-15 10:47   ` Dmitry Bogdanov
2023-03-15 22:54     ` Mike Christie
2023-03-16  0:01       ` michael.christie
2023-03-09 22:33 ` [PATCH 08/18] scsi: target: iscsit: Add helper to check when cmd has failed Mike Christie
2023-03-09 22:33 ` [PATCH 09/18] scsi: target: iscsit: Cleanup isert commands at conn closure Mike Christie
2023-03-09 22:33 ` [PATCH 10/18] IB/isert: Fix hang in target_wait_for_cmds Mike Christie
2023-03-09 22:33 ` [PATCH 11/18] IB/isert: Fix use after free during conn cleanup Mike Christie
2023-03-15 15:21   ` Sagi Grimberg
2023-03-09 22:33 ` [PATCH 12/18] scsi: target: iscsit: free cmds before session free Mike Christie
2023-03-09 22:33 ` [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling Mike Christie
2023-03-15 16:13   ` Dmitry Bogdanov
2023-03-15 16:44     ` Mike Christie
2023-03-15 19:11       ` Dmitry Bogdanov
2023-03-15 21:42         ` Mike Christie [this message]
2023-03-16 10:39           ` Dmitry Bogdanov
2023-03-16 16:03             ` Mike Christie
2023-03-16 16:07             ` Mike Christie
2023-03-09 22:33 ` [PATCH 14/18] scsi: target: Don't set CMD_T_FABRIC_STOP for aborted tasks Mike Christie
2023-03-09 22:33 ` [PATCH 15/18] scsi: target: iscsit: Fix TAS handling during conn cleanup Mike Christie
2023-03-09 22:33 ` [PATCH 16/18] scsi: target: drop tas arg from __transport_wait_for_tasks Mike Christie
2023-03-09 22:33 ` [PATCH 17/18] scsi: target: Remove sess_cmd_lock Mike Christie
2023-03-09 22:33 ` [PATCH 18/18] scsi: target: Move tag pr_debug to before we do a put on the cmd Mike Christie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a1c2ac4-d7cd-b7fe-cc74-7e58e8fca968@oracle.com \
    --to=michael.christie@oracle.com \
    --cc=d.bogdanov@yadro.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=mlombard@redhat.com \
    --cc=sagi@grimberg.me \
    --cc=target-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).