All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Mark Salyzyn" <mark_salyzyn@us.xyratex.com>
To: linux-scsi@vger.kernel.org
Cc: Darrick Wong <djwong@us.ibm.com>,
	Xiangliang Yu <yuxiangl@marvell.com>,
	Jack Wang <jack_wang@usish.com>, Luben Tuikov <ltuikov@yahoo.com>,
	James Bottomley <jbottomley@parallels.com>
Subject: [PATCH] [SCSI] libsas panic when single phy disabled on a wide port
Date: Thu, 22 Sep 2011 08:32:23 -0700	[thread overview]
Message-ID: <FC1F9E38C0703341A1A750524E2525A9018AE2CF@XYUS-EX22.xyus.xyratex.com> (raw)
In-Reply-To: <FC1F9E38C0703341A1A750524E2525A9018AE2C6@XYUS-EX22.xyus.xyratex.com>

[-- Attachment #1: Type: text/plain, Size: 6452 bytes --]

When a wide port is being utilized to a target, if one disables only one
of the
phys, we get an OS crash:

BUG: unable to handle kernel NULL pointer dereference at
0000000000000238
IP: [<ffffffff814ca9b1>] mutex_lock+0x21/0x50
PGD 4103f5067 PUD 41dba9067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/bus/pci/slots/5/address
CPU 0
Modules linked in: pm8001(U) ses enclosure fuse nfsd exportfs autofs4
ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl
auth_rpcgss 8021q fcoe libfcoe garp libfc scsi_transport_fc stp scsi_tgt
llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 sr_mod cdrom
dm_mirror dm_region_hash dm_log uinput sg i2c_i801 i2c_core iTCO_wdt
iTCO_vendor_support e1000e mlx4_ib ib_mad ib_core mlx4_en mlx4_core ext3
jbd mbcache sd_mod crc_t10dif usb_storage ata_generic pata_acpi ata_piix
libsas(U) scsi_transport_sas dm_mod [last unloaded: pm8001]

Modules linked in: pm8001(U) ses enclosure fuse nfsd exportfs autofs4
ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl
auth_rpcgss 8021q fcoe libfcoe garp libfc scsi_transport_fc stp scsi_tgt
llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 sr_mod cdrom
dm_mirror dm_region_hash dm_log uinput sg i2c_i801 i2c_core iTCO_wdt
iTCO_vendor_support e1000e mlx4_ib ib_mad ib_core mlx4_en mlx4_core ext3
jbd mbcache sd_mod crc_t10dif usb_storage ata_generic pata_acpi ata_piix
libsas(U) scsi_transport_sas dm_mod [last unloaded: pm8001]
Pid: 5146, comm: scsi_wq_5 Not tainted
2.6.32-71.29.1.el6.lustre.7.x86_64 #1 Storage Server
RIP: 0010:[<ffffffff814ca9b1>]  [<ffffffff814ca9b1>]
mutex_lock+0x21/0x50
RSP: 0018:ffff8803e4e33d30  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000238 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8803e664c800 RDI: 0000000000000238
RBP: ffff8803e4e33d40 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000238 R14: ffff88041acb7200 R15: ffff88041c51ada0
FS:  0000000000000000(0000) GS:ffff880028200000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000238 CR3: 0000000410143000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scsi_wq_5 (pid: 5146, threadinfo ffff8803e4e32000, task
ffff8803e4e294a0)
Stack:
 ffff8803e664c800 0000000000000000 ffff8803e4e33d70 ffffffffa001f06e
<0> ffff8803e4e33d60 ffff88041c51ada0 ffff88041acb7200 ffff88041bc0aa00
<0> ffff8803e4e33d90 ffffffffa0032b6c 0000000000000014 ffff88041acb7200
Call Trace:
 [<ffffffffa001f06e>] sas_port_delete_phy+0x2e/0xa0 [scsi_transport_sas]
 [<ffffffffa0032b6c>] sas_unregister_devs_sas_addr+0xac/0xe0 [libsas]
 [<ffffffffa0034914>] sas_ex_revalidate_domain+0x204/0x330 [libsas]
 [<ffffffffa00307f0>] ? sas_revalidate_domain+0x0/0x90 [libsas]
 [<ffffffffa0030855>] sas_revalidate_domain+0x65/0x90 [libsas]
 [<ffffffff8108c7d0>] worker_thread+0x170/0x2a0
 [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108c660>] ? worker_thread+0x0/0x2a0
 [<ffffffff81091b36>] kthread+0x96/0xa0
 [<ffffffff810141ca>] child_rip+0xa/0x20
 [<ffffffff81091aa0>] ? kthread+0x0/0xa0
 [<ffffffff810141c0>] ? child_rip+0x0/0x20
Code: ff ff 85 c0 75 ed eb d6 66 90 55 48 89 e5 48 83 ec 10 48 89 1c 24
4c 89 64 24 08 0f 1f 44 00 00 48 89 fb e8 92 f4 ff ff 48 89 df <f0> ff
0f 79 05 e8 25 00 00 00 65 48 8b 04 25 08 cc 00 00 48 2d
RIP  [<ffffffff814ca9b1>] mutex_lock+0x21/0x50
 RSP <ffff8803e4e33d30>
CR2: 0000000000000238

The following patch is admittedly a band-aid, and does not solve the
root cause, but it still is a good candidate for hardening as a pointer
check before reference.

The real patch is enclosed as an attachment since Outlook is my current
MTA.

Signed-off-by: Mark Salyzyn <mark_salyzyn@xyus.xyratex.com>
Cc: Luben Tuikov <ltuikov@yahoo.com>
Cc: Darrick J Wong <djwong@us.ibm.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: Xiangliang Yu <yuxiangl@marvell.com>
Cc: Jack Wang <jack_wang@usish.com>

 sas_expander.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff -rup scsi-misc-2.6/drivers/scsi/libsas/sas_expander.c
scsi-misc-2.6.new/drivers/scsi/libsas/sas_expander.c
--- scsi-misc-2.6/drivers/scsi/libsas/sas_expander.c	2011-08-31
08:32:21.000000000 -0400
+++ scsi-misc-2.6.new/drivers/scsi/libsas/sas_expander.c
2011-09-22 11:20:06.000000000 -0400
@@ -1769,10 +1769,12 @@ static void sas_unregister_devs_sas_addr
 		sas_disable_routing(parent, phy->attached_sas_addr);
 	}
 	memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE);
-	sas_port_delete_phy(phy->port, phy->phy);
-	if (phy->port->num_phys == 0)
-		sas_port_delete(phy->port);
-	phy->port = NULL;
+	if (phy->port) {
+		sas_port_delete_phy(phy->port, phy->phy);
+		if (phy->port->num_phys == 0)
+			sas_port_delete(phy->port);
+		phy->port = NULL;
+	}
 }
 
 static int sas_discover_bfs_by_root_level(struct domain_device *root,

______________________________________________________________________
This email may contain privileged or confidential information, which should only be used for the purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such information. If you are not the intended recipient of this message, please notify the sender by return and delete it. You may not use, copy, disclose or rely on the information contained in it.
 
Internet email is susceptible to data corruption, interception and unauthorised amendment for which Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses in this email, nor for any losses caused as a result of viruses.
 
Xyratex Technology Limited (03134912), Registered in England & Wales, Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA.
 
The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in The People's Republic of China and Xyratex Japan Limited registered in Japan.
______________________________________________________________________
 
\r

[-- Attachment #2: libsas_single_wide.patch --]
[-- Type: application/octet-stream, Size: 4906 bytes --]

When a wide port is being utilized to a target, if one disables only one of the
phys, we get an OS crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000238
IP: [<ffffffff814ca9b1>] mutex_lock+0x21/0x50
PGD 4103f5067 PUD 41dba9067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/bus/pci/slots/5/address
CPU 0
Modules linked in: pm8001(U) ses enclosure fuse nfsd exportfs autofs4 ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl auth_rpcgss 8021q fcoe libfcoe garp libfc scsi_transport_fc stp scsi_tgt llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 sr_mod cdrom dm_mirror dm_region_hash dm_log uinput sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e mlx4_ib ib_mad ib_core mlx4_en mlx4_core ext3 jbd mbcache sd_mod crc_t10dif usb_storage ata_generic pata_acpi ata_piix libsas(U) scsi_transport_sas dm_mod [last unloaded: pm8001]

Modules linked in: pm8001(U) ses enclosure fuse nfsd exportfs autofs4 ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl auth_rpcgss 8021q fcoe libfcoe garp libfc scsi_transport_fc stp scsi_tgt llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 sr_mod cdrom dm_mirror dm_region_hash dm_log uinput sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e mlx4_ib ib_mad ib_core mlx4_en mlx4_core ext3 jbd mbcache sd_mod crc_t10dif usb_storage ata_generic pata_acpi ata_piix libsas(U) scsi_transport_sas dm_mod [last unloaded: pm8001]
Pid: 5146, comm: scsi_wq_5 Not tainted 2.6.32-71.29.1.el6.lustre.7.x86_64 #1 Storage Server
RIP: 0010:[<ffffffff814ca9b1>]  [<ffffffff814ca9b1>] mutex_lock+0x21/0x50
RSP: 0018:ffff8803e4e33d30  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000238 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8803e664c800 RDI: 0000000000000238
RBP: ffff8803e4e33d40 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000238 R14: ffff88041acb7200 R15: ffff88041c51ada0
FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000238 CR3: 0000000410143000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scsi_wq_5 (pid: 5146, threadinfo ffff8803e4e32000, task ffff8803e4e294a0)
Stack:
 ffff8803e664c800 0000000000000000 ffff8803e4e33d70 ffffffffa001f06e
<0> ffff8803e4e33d60 ffff88041c51ada0 ffff88041acb7200 ffff88041bc0aa00
<0> ffff8803e4e33d90 ffffffffa0032b6c 0000000000000014 ffff88041acb7200
Call Trace:
 [<ffffffffa001f06e>] sas_port_delete_phy+0x2e/0xa0 [scsi_transport_sas]
 [<ffffffffa0032b6c>] sas_unregister_devs_sas_addr+0xac/0xe0 [libsas]
 [<ffffffffa0034914>] sas_ex_revalidate_domain+0x204/0x330 [libsas]
 [<ffffffffa00307f0>] ? sas_revalidate_domain+0x0/0x90 [libsas]
 [<ffffffffa0030855>] sas_revalidate_domain+0x65/0x90 [libsas]
 [<ffffffff8108c7d0>] worker_thread+0x170/0x2a0
 [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108c660>] ? worker_thread+0x0/0x2a0
 [<ffffffff81091b36>] kthread+0x96/0xa0
 [<ffffffff810141ca>] child_rip+0xa/0x20
 [<ffffffff81091aa0>] ? kthread+0x0/0xa0
 [<ffffffff810141c0>] ? child_rip+0x0/0x20
Code: ff ff 85 c0 75 ed eb d6 66 90 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 0f 1f 44 00 00 48 89 fb e8 92 f4 ff ff 48 89 df <f0> ff 0f 79 05 e8 25 00 00 00 65 48 8b 04 25 08 cc 00 00 48 2d
RIP  [<ffffffff814ca9b1>] mutex_lock+0x21/0x50
 RSP <ffff8803e4e33d30>
CR2: 0000000000000238

The following patch is admittedly a band-aid, and does not solve the root cause, but it still is a good candidate for hardening as a pointer check before reference.

Signed-off-by: Mark Salyzyn <mark_salyzyn@xyus.xyratex.com>
Cc: Luben Tuikov <ltuikov@yahoo.com>
Cc: Darrick J Wong <djwong@us.ibm.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: Xiangliang Yu <yuxiangl@marvell.com>
Cc: Jack Wang <jack_wang@usish.com>

 sas_expander.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff -rup scsi-misc-2.6/drivers/scsi/libsas/sas_expander.c scsi-misc-2.6.new/drivers/scsi/libsas/sas_expander.c
--- scsi-misc-2.6/drivers/scsi/libsas/sas_expander.c	2011-08-31 08:32:21.000000000 -0400
+++ scsi-misc-2.6.new/drivers/scsi/libsas/sas_expander.c	2011-09-22 11:20:06.000000000 -0400
@@ -1769,10 +1769,12 @@ static void sas_unregister_devs_sas_addr
 		sas_disable_routing(parent, phy->attached_sas_addr);
 	}
 	memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE);
-	sas_port_delete_phy(phy->port, phy->phy);
-	if (phy->port->num_phys == 0)
-		sas_port_delete(phy->port);
-	phy->port = NULL;
+	if (phy->port) {
+		sas_port_delete_phy(phy->port, phy->phy);
+		if (phy->port->num_phys == 0)
+			sas_port_delete(phy->port);
+		phy->port = NULL;
+	}
 }
 
 static int sas_discover_bfs_by_root_level(struct domain_device *root,

  reply	other threads:[~2011-09-22 15:33 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-28  4:19 [PATCH] [SCSI] libsas: Allow expander T-T attachments Luben Tuikov
2011-07-28  6:46 ` Jack Wang
2011-07-28  6:46   ` Jack Wang
2011-07-28  7:52 ` James Bottomley
2011-07-28  7:52   ` James Bottomley
2011-09-21  0:50   ` Dan Williams
2011-09-21  0:50     ` Dan Williams
2011-09-22 11:30 ` James Bottomley
2011-09-22 11:30   ` James Bottomley
2011-09-22 15:04   ` Mark Salyzyn
2011-09-22 15:04     ` Mark Salyzyn
2011-09-22 15:32     ` Mark Salyzyn [this message]
2011-09-22 15:50       ` [PATCH] [SCSI] pm8001 DEV_IS_GONE infinite retry Mark Salyzyn
2011-09-26  2:20         ` Jack Wang
2011-09-26 13:15           ` Mark Salyzyn
2011-09-26 14:57         ` [PATCH] [SCSI] pm8001 missing break statements Mark Salyzyn
2011-09-27  4:27           ` Jack Wang
2011-09-30  2:21       ` [PATCH] [SCSI] libsas panic when single phy disabled on a wide port Jack Wang
2011-10-01  1:43         ` Dan Williams
2011-10-03 13:07           ` Mark Salyzyn
2011-10-03 15:58             ` Mark Salyzyn
2011-10-04  8:35               ` Jack Wang
2011-10-04 23:30               ` Dan Williams
2011-10-04 23:38                 ` Dan Williams
2011-10-05 12:10                 ` Mark Salyzyn
2011-10-06  3:33                   ` Jack Wang
2011-09-22 16:30     ` [PATCH] [SCSI] libsas: Allow expander T-T attachments Luben Tuikov
2011-09-22 16:35     ` Luben Tuikov
2011-09-22 17:03       ` Christoph Hellwig
2011-09-22 17:11         ` Luben Tuikov
2011-09-22 17:11           ` Luben Tuikov
2011-09-23 18:42           ` Christoph Hellwig
2011-09-23 18:46             ` Alan Cox
2011-09-22 16:41   ` [RESEND] " Luben Tuikov
2011-09-22 16:50     ` Christoph Hellwig
2011-09-22 17:24       ` Luben Tuikov
2011-09-22 17:32         ` Bart Van Assche
2011-09-22 16:55     ` Mark Salyzyn
2011-09-22 16:55       ` Mark Salyzyn
2011-09-22 17:19       ` Luben Tuikov
2011-09-22 17:19         ` Luben Tuikov
2011-09-22 17:44         ` Mark Salyzyn
2011-09-22 17:44           ` Mark Salyzyn
2011-09-22 17:48       ` Dan Williams
2011-09-22 17:48         ` Dan Williams
2011-09-22 17:41     ` Dan Williams
2011-09-23  1:47       ` Jack Wang
2011-09-23  1:47         ` Jack Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FC1F9E38C0703341A1A750524E2525A9018AE2CF@XYUS-EX22.xyus.xyratex.com \
    --to=mark_salyzyn@us.xyratex.com \
    --cc=djwong@us.ibm.com \
    --cc=jack_wang@usish.com \
    --cc=jbottomley@parallels.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=ltuikov@yahoo.com \
    --cc=yuxiangl@marvell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.