linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Christie <michael.christie@oracle.com>
To: Bodo Stroesser <bstroesser@ts.fujitsu.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org, target-devel@vger.kernel.org
Subject: Re: [PATCH] scsi: target: loop: Fix handling of aborted TMRs
Date: Thu, 6 Aug 2020 20:42:07 -0500	[thread overview]
Message-ID: <61420558-b077-13ca-b5e3-6269fb419233@oracle.com> (raw)
In-Reply-To: <b8fc90b9-7038-b202-e7ef-bc4da56816f0@oracle.com>

On 7/27/20 1:13 PM, Mike Christie wrote:
> I would really like to figure out if #2 is a regression and understand what happened so we don't make the same mistake again and also fix iscsi. The problem with iscsi though is every couple kernels has a bug in this code path so git bisect is being a pain.

Sorry for the delay. I was able to track down the hang part of this. It turns out loop did work at some point. It worked a couple times over the years, but every X kernels we would break it again.

The last time we didn't have the hang was because we executed TMFs one at a time. We would end up always calling queue_tm_rsp since the LUN reset was always going to execute after the abort. So to fix the wake up / hang part of the bug for everyone we could revert:

commit db5b21a24e01d35495014076700efa02d6dcbb68
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Tue Nov 27 15:51:59 2018 -0800

    scsi: target/core: Use system workqueues for TMF

It's not clear how much processing TMFs for the same device in parallel helps. If it doesn't help, this would be an easy fix for everyone.

One other thing that is not clear to me is if aborted_task was always meant to be used for task management requests. I think when it was first added, it was only called for normal cmds. I couldn't tell exactly when we meant (vs it snuck in by accident) to start calling it for all commands.

If the async TMF feature helps, then I'm ok with with your fix for loop where we say the aborted_task callout was meant for all commands and go from there. We then have to do something similar in xen, and check that the other drivers didn't expect the old behavior. I think for that qla2xxx bug you mentioned, tcm_qla2xxx_aborted_task didn't expect a qla_tgt_mgmt_cmd to be passed into aborted_task so it crashed trying to access it as a qla_tgt_cmd.

What do you think?

      reply	other threads:[~2020-08-07  1:44 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-15 16:04 [PATCH] scsi: target: loop: Fix handling of aborted TMRs Bodo Stroesser
2020-07-26  5:16 ` Mike Christie
2020-07-26 11:02   ` Bodo Stroesser
2020-07-26 18:37     ` Mike Christie
2020-07-27 14:26       ` Bodo Stroesser
2020-07-27 18:13         ` Mike Christie
2020-08-07  1:42           ` Mike Christie [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=61420558-b077-13ca-b5e3-6269fb419233@oracle.com \
    --to=michael.christie@oracle.com \
    --cc=bstroesser@ts.fujitsu.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=target-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).