All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Tejun Heo <tj@kernel.org>,
	linux-ide@vger.kernel.org,
	Tom Jackson <thomas.p.jackson@intel.com>,
	linux-scsi@vger.kernel.org
Subject: Re: [PATCH 10/12] scsi: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations)
Date: Sun, 22 Apr 2012 08:24:43 -0700	[thread overview]
Message-ID: <CABE8wwvJgS9iGchsOLzrJ7b5+coLoU9mof7m3oVSBM8Ae0-y=g@mail.gmail.com> (raw)
In-Reply-To: <1335010949.3081.8.camel@dabdike.lan>

On Sat, Apr 21, 2012 at 5:22 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Fri, 2012-04-13 at 16:37 -0700, Dan Williams wrote:
>> Rapid ata hotplug on a libsas controller results in cases where libsas
>> is waiting indefinitely on eh to perform an ata probe.
>>
>> A race exists between scsi_schedule_eh() and scsi_restart_operations()
>> in the case when scsi_restart_operations() issues i/o to other devices
>> in the sas domain.  When this happens the host state transitions from
>> SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and
>> ->host_busy is non-zero so we put the eh thread to sleep even though
>> ->host_eh_scheduled is active.
>>
>> Before putting the error handler to sleep we need to check if the
>> host_state needs to return to SHOST_RECOVERY for another trip through
>> eh.
>>
>> Cc: Tejun Heo <tj@kernel.org>
>> Reported-by: Tom Jackson <thomas.p.jackson@intel.com>
>> Tested-by: Tom Jackson <thomas.p.jackson@intel.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  drivers/scsi/scsi_error.c |   14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index 2cfcbff..0945d47 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -1687,6 +1687,20 @@ static void scsi_restart_operations(struct Scsi_Host *shost)
>>        * requests are started.
>>        */
>>       scsi_run_host_queues(shost);
>> +
>> +     /*
>> +      * if eh is active and host_eh_scheduled is pending we need to re-run
>> +      * recovery.  we do this check after scsi_run_host_queues() to allow
>> +      * everything pent up since the last eh run a chance to make forward
>> +      * progress before we sync again.  Either we'll immediately re-run
>> +      * recovery or scsi_device_unbusy() will wake us again when these
>> +      * pending commands complete.
>> +      */
>> +     spin_lock_irqsave(shost->host_lock, flags);
>> +     if (shost->host_eh_scheduled)
>> +             if (scsi_host_set_state(shost, SHOST_RECOVERY))
>> +                     WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY));
>> +     spin_unlock_irqrestore(shost->host_lock, flags);
>
> This doesn't really look to be the way to fix the race, because we'll
> start up the host again before closing it down.

That's part of the intent.  Any command that has been waiting to be
restarted has been potentially waiting up to a minute for recovery to
complete, so the idea is that we trickle the blocked commands through
and then come back for another pass.  This is close to what "late"
callers sto csi_schedule_eh() experience anyways.

> Isn't the correct way to put
>
> if (shost->host_eh_scheduled)
>        continue;
>
> into the scsi_error_handler() loop just *before*
> scsi_restart_operations()?

I think that only narrows the window...  The hole is
scsi_schedule_eh() called anytime after the last run completes and
before the change of the host state, so host_eh_scheduled can go
active immediately after that check right?

We could move the check into scsi_restart_operations() under the lock
before we change the state back to running, but I figure if we got
that far just let the queue run again.

--
Dan

  reply	other threads:[~2012-04-22 15:24 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-13 23:36 [GIT PATCH 00/12] libsas fixes for 3.4 Dan Williams
2012-04-13 23:36 ` [PATCH 01/12] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work Dan Williams
2012-04-13 23:37 ` [PATCH 02/12] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
2012-04-13 23:37 ` [PATCH 03/12] libata, libsas: introduce sched_eh and end_eh port ops Dan Williams
2012-04-21  6:19   ` Jeff Garzik
2012-04-22 17:30   ` James Bottomley
2012-04-23  2:33     ` Jeff Garzik
2012-04-23  8:10       ` James Bottomley
2012-04-23 19:13         ` Dan Williams
2012-04-23 22:22           ` James Bottomley
2012-04-23 22:49             ` Dan Williams
2012-04-24 10:11               ` Jacek Danecki
2012-04-23 19:41     ` Dan Williams
2012-04-26 17:21       ` Dan Williams
2012-04-13 23:37 ` [PATCH 04/12] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys Dan Williams
2012-04-13 23:37 ` [PATCH 05/12] libsas: fix sas_get_port_device regression Dan Williams
2012-04-13 23:37 ` [PATCH 06/12] libsas: unify domain_device sas_rphy lifetimes Dan Williams
2012-04-13 23:37 ` [PATCH 07/12] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready Dan Williams
2012-04-13 23:37 ` [PATCH 08/12] libata: make ata_print_id atomic Dan Williams
2012-04-13 23:37 ` [PATCH 09/12] libsas, libata: fix start of life for a sas ata_port Dan Williams
2012-04-21  6:20   ` Jeff Garzik
2012-04-13 23:37 ` [PATCH 10/12] scsi: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Dan Williams
2012-04-21 12:22   ` James Bottomley
2012-04-22 15:24     ` Dan Williams [this message]
2012-04-13 23:37 ` [PATCH 11/12] libsas: fix false positive 'device attached' conditions Dan Williams
2012-04-22 10:53   ` James Bottomley
2012-04-22 15:56     ` Dan Williams
2012-04-13 23:37 ` [PATCH 12/12] scsi_transport_sas: fix delete vs scan race Dan Williams
2012-04-22 10:38   ` James Bottomley
2012-04-22 15:43     ` Dan Williams
2012-04-22 17:15       ` James Bottomley
2012-05-05 21:52         ` Dan Williams
2012-05-20 19:20           ` Dan Williams
2012-04-14  8:19 ` [GIT PATCH 00/12] libsas fixes for 3.4 jack_wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABE8wwvJgS9iGchsOLzrJ7b5+coLoU9mof7m3oVSBM8Ae0-y=g@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=thomas.p.jackson@intel.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.