All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org,
	Dariusz Majchrzak <dariusz.majchrzak@intel.com>
Subject: Re: [PATCH 12/12] scsi_transport_sas: fix delete vs scan race
Date: Sun, 20 May 2012 12:20:06 -0700	[thread overview]
Message-ID: <CAA9_cmeL5h_5xESis06pyT-7bt+K2eQrN5SR6_b25qLBSDVXvA@mail.gmail.com> (raw)
In-Reply-To: <CAA9_cmcCQtyRBEt-c8EP6wuSukqZn0Mswxi3nDG7R-88L57BjA@mail.gmail.com>

On Sat, May 5, 2012 at 2:52 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Sun, Apr 22, 2012 at 10:15 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
>> Async scan here means any scan in a different thread, right ... it just
>> has to be asynchronous relative to us?  So that includes the manually
>> initiated ones and hotplug ones, doesn't it?
>
> [ resend since I notice this never hit the lists ]
>
> Hmm, well no I don't think so.  This literally means the initial async
> scan, and the
> failure window is between when we skip the call to
> scsi_sysfs_add_sdev() (in scsi_add_lun() under the scan_mutex) and
> finally call scsi_sysfs_add_sdev() again via scsi_finish_async_scan().
> I don't see how that fixes it because when we fail the sequence goes:
>
> mutex_lock(scan_mutex)
> starget->parent = end_device;
> scsi_add_lun()
> mutex_unlock(scan_mutex)
>
> device_del(end_device)
>
> mutex_lock(scan_mutex)
> device_add(starget)
> <crash>
>
> As far as I can see taking the scan_mutex in sas_rphy_remove() does
> not change this failure window.  Unless I missed something?
>
> I am going to re-submit this patch as is with the proposed libsas batch for 3.5.

It turns out this patch can cause a deadlock in the scenario where we
have two hosts scanning and the "previous" host (according to the
async scan queue), experiences a device removal event.  I think the
following should be all we need:

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 01b0374..8906557 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1714,6 +1714,9 @@ static void scsi_sysfs_add_devices(struct
Scsi_Host *shost)
 {
        struct scsi_device *sdev;
        shost_for_each_device(sdev, shost) {
+               /* target removed before the device could be added */
+               if (sdev->sdev_state == SDEV_DEL)
+                       continue;
                if (!scsi_host_scan_allowed(shost) ||
                    scsi_sysfs_add_sdev(sdev) != 0)
                        __scsi_remove_device(sdev);

...since starget removal will mark the sdevs as deleted under
scan_mutex.  scsi_sysfs_add_devices can simply ignore deleted devices.
 I'll post this patch after Darek has a chance to try it out.

--
Dan

  reply	other threads:[~2012-05-20 19:20 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-13 23:36 [GIT PATCH 00/12] libsas fixes for 3.4 Dan Williams
2012-04-13 23:36 ` [PATCH 01/12] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work Dan Williams
2012-04-13 23:37 ` [PATCH 02/12] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
2012-04-13 23:37 ` [PATCH 03/12] libata, libsas: introduce sched_eh and end_eh port ops Dan Williams
2012-04-21  6:19   ` Jeff Garzik
2012-04-22 17:30   ` James Bottomley
2012-04-23  2:33     ` Jeff Garzik
2012-04-23  8:10       ` James Bottomley
2012-04-23 19:13         ` Dan Williams
2012-04-23 22:22           ` James Bottomley
2012-04-23 22:49             ` Dan Williams
2012-04-24 10:11               ` Jacek Danecki
2012-04-23 19:41     ` Dan Williams
2012-04-26 17:21       ` Dan Williams
2012-04-13 23:37 ` [PATCH 04/12] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys Dan Williams
2012-04-13 23:37 ` [PATCH 05/12] libsas: fix sas_get_port_device regression Dan Williams
2012-04-13 23:37 ` [PATCH 06/12] libsas: unify domain_device sas_rphy lifetimes Dan Williams
2012-04-13 23:37 ` [PATCH 07/12] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready Dan Williams
2012-04-13 23:37 ` [PATCH 08/12] libata: make ata_print_id atomic Dan Williams
2012-04-13 23:37 ` [PATCH 09/12] libsas, libata: fix start of life for a sas ata_port Dan Williams
2012-04-21  6:20   ` Jeff Garzik
2012-04-13 23:37 ` [PATCH 10/12] scsi: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Dan Williams
2012-04-21 12:22   ` James Bottomley
2012-04-22 15:24     ` Dan Williams
2012-04-13 23:37 ` [PATCH 11/12] libsas: fix false positive 'device attached' conditions Dan Williams
2012-04-22 10:53   ` James Bottomley
2012-04-22 15:56     ` Dan Williams
2012-04-13 23:37 ` [PATCH 12/12] scsi_transport_sas: fix delete vs scan race Dan Williams
2012-04-22 10:38   ` James Bottomley
2012-04-22 15:43     ` Dan Williams
2012-04-22 17:15       ` James Bottomley
2012-05-05 21:52         ` Dan Williams
2012-05-20 19:20           ` Dan Williams [this message]
2012-04-14  8:19 ` [GIT PATCH 00/12] libsas fixes for 3.4 jack_wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAA9_cmeL5h_5xESis06pyT-7bt+K2eQrN5SR6_b25qLBSDVXvA@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=dariusz.majchrzak@intel.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.