All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: [PATCH 12/12] scsi_transport_sas: fix delete vs scan race
Date: Sun, 22 Apr 2012 08:43:22 -0700	[thread overview]
Message-ID: <CABE8wwssa-_MsCTe0FJeCLY5KTk1sRcsmUjF8Sb-5sofc=zuFQ@mail.gmail.com> (raw)
In-Reply-To: <1335091115.13208.14.camel@dabdike.lan>

On Sun, Apr 22, 2012 at 3:38 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Fri, 2012-04-13 at 16:37 -0700, Dan Williams wrote:
>> The following crash results from cases where the end_device has been
>> removed before scsi_sysfs_add_sdev has had a chance to run.
>>
>>  BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
>>  IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
>>  ...
>>  Call Trace:
>>   [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
>>   [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
>>   [<ffffffff8125e641>] kobject_add_varg+0x41/0x50
>>   [<ffffffff8125e70b>] kobject_add+0x64/0x66
>>   [<ffffffff8131122b>] device_add+0x12d/0x63a
>>   [<ffffffff814b65ea>] ? _raw_spin_unlock_irqrestore+0x47/0x56
>>   [<ffffffff8107de15>] ? module_refcount+0x89/0xa0
>>   [<ffffffff8132f348>] scsi_sysfs_add_sdev+0x4e/0x28a
>>   [<ffffffff8132dcbb>] do_scan_async+0x9c/0x145
>>
>> ...teach sas_rphy_remove to wait for async scanning to quiesce before
>> removing the end_device.  It seems this is a more general problem [1],
>> but this patch only addresses sas transport.
>>
>> [1]: 23edb6e [SCSI] mpt2sas: Do not set sas_device->starget to NULL from
>> the slave_destroy callback when all the LUNS have been deleted
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  drivers/scsi/scsi_transport_sas.c |    6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
>> index f7565fc..47abb90 100644
>> --- a/drivers/scsi/scsi_transport_sas.c
>> +++ b/drivers/scsi/scsi_transport_sas.c
>> @@ -33,8 +33,9 @@
>>  #include <linux/bsg.h>
>>
>>  #include <scsi/scsi.h>
>> -#include <scsi/scsi_device.h>
>>  #include <scsi/scsi_host.h>
>> +#include <scsi/scsi_scan.h>
>> +#include <scsi/scsi_device.h>
>>  #include <scsi/scsi_transport.h>
>>  #include <scsi/scsi_transport_sas.h>
>>
>> @@ -1667,6 +1668,9 @@ sas_rphy_remove(struct sas_rphy *rphy)
>>  {
>>       struct device *dev = &rphy->dev;
>>
>> +     /* prevent device_del() while child device_add() may be in-flight */
>> +     scsi_complete_async_scans();
>> +
>>       switch (rphy->identify.device_type) {
>
> This doesn't really fix the problem, it merely narrows the window (we
> still crash in the much shorter window if another async scan starts
> after you check for completion).

Oh, I was under the impression that async scanning was only the
initial scan and everything was sync thereafter since
scsi_finish_async_scan() clears the host ->async_scan flag?

> Isn't the fix that will prevent all of
> this to hold the scan mutex across scsi_remove_device() ... in which
> case it should probably be part of scsi_remove_device()?

I thought along these lines initially, but in this case we're crashing
because the sas rphy is removed before the starget is added, so
scsi_remove_device() is out of the picture.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2012-04-22 15:43 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-13 23:36 [GIT PATCH 00/12] libsas fixes for 3.4 Dan Williams
2012-04-13 23:36 ` [PATCH 01/12] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work Dan Williams
2012-04-13 23:37 ` [PATCH 02/12] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
2012-04-13 23:37 ` [PATCH 03/12] libata, libsas: introduce sched_eh and end_eh port ops Dan Williams
2012-04-21  6:19   ` Jeff Garzik
2012-04-22 17:30   ` James Bottomley
2012-04-23  2:33     ` Jeff Garzik
2012-04-23  8:10       ` James Bottomley
2012-04-23 19:13         ` Dan Williams
2012-04-23 22:22           ` James Bottomley
2012-04-23 22:49             ` Dan Williams
2012-04-24 10:11               ` Jacek Danecki
2012-04-23 19:41     ` Dan Williams
2012-04-26 17:21       ` Dan Williams
2012-04-13 23:37 ` [PATCH 04/12] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys Dan Williams
2012-04-13 23:37 ` [PATCH 05/12] libsas: fix sas_get_port_device regression Dan Williams
2012-04-13 23:37 ` [PATCH 06/12] libsas: unify domain_device sas_rphy lifetimes Dan Williams
2012-04-13 23:37 ` [PATCH 07/12] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready Dan Williams
2012-04-13 23:37 ` [PATCH 08/12] libata: make ata_print_id atomic Dan Williams
2012-04-13 23:37 ` [PATCH 09/12] libsas, libata: fix start of life for a sas ata_port Dan Williams
2012-04-21  6:20   ` Jeff Garzik
2012-04-13 23:37 ` [PATCH 10/12] scsi: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Dan Williams
2012-04-21 12:22   ` James Bottomley
2012-04-22 15:24     ` Dan Williams
2012-04-13 23:37 ` [PATCH 11/12] libsas: fix false positive 'device attached' conditions Dan Williams
2012-04-22 10:53   ` James Bottomley
2012-04-22 15:56     ` Dan Williams
2012-04-13 23:37 ` [PATCH 12/12] scsi_transport_sas: fix delete vs scan race Dan Williams
2012-04-22 10:38   ` James Bottomley
2012-04-22 15:43     ` Dan Williams [this message]
2012-04-22 17:15       ` James Bottomley
2012-05-05 21:52         ` Dan Williams
2012-05-20 19:20           ` Dan Williams
2012-04-14  8:19 ` [GIT PATCH 00/12] libsas fixes for 3.4 jack_wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABE8wwssa-_MsCTe0FJeCLY5KTk1sRcsmUjF8Sb-5sofc=zuFQ@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.