* aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives
@ 2007-11-30 9:22 Krzysztof Błaszkowski
2007-11-30 21:33 ` Darrick J. Wong
0 siblings, 1 reply; 15+ messages in thread
From: Krzysztof Błaszkowski @ 2007-11-30 9:22 UTC (permalink / raw)
To: linux-scsi; +Cc: Vladislav Bolkhovitin
Hello all,
I noticed this according to syslog. furthermore if aic94xx is connected to
single sata drive only then there is no crash but device is not recognized
too. (mysterious: "ERROR: Unidentified device type 5").
A crash recorded in syslog:
aic94xx: Adaptec aic94xx SAS/SATA driver version 1.0.3 loaded
ACPI: PCI Interrupt 0000:04:02.0[A] -> GSI 16 (level, low) -> IRQ 16
aic94xx: found Adaptec AIC-9410W SAS/SATA Host Adapter, device 0000:04:02.0
scsi6 : aic94xx
PM: Adding info for No Bus:host6
PM: Adding info for No Bus:0000:04:02.0
PM: Removing info for No Bus:0000:04:02.0
aic94xx: Found sequencer Firmware version 1.1 (V30)
aic94xx: device 0000:04:02.0: SAS addr 500304800004ce20, PCBA SN ORG, 8 phys,
8 enabled phys, flash present, BIOS build 1822
PM: Adding info for No Bus:phy-6:0
<snip>
BUG: unable to handle kernel NULL pointer dereference at virtual address
00000074
printing eip:
f8e2daf9
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: aic94xx firmware_class libsas scsi_transport_sas nfsd
exportfs nvram speedstep_lib freq_table thermal processor fan button battery
edd ac ipv6 evdev joydev sr_mod ide_cd cdrom e1000 ehci_hcd i2c_i801 uhci_hcd
rng_core dm_mod usbcore
CPU: 0
EIP: 0060:[<f8e2daf9>] Not tainted VLI
EFLAGS: 00010286 (2.6.22.8 #6)
EIP is at sas_rphy_add+0x9/0x100 [scsi_transport_sas]
eax: 00000000 ebx: 00000000 ecx: 00000004 edx: 00000282
esi: f2d8c080 edi: 00000000 ebp: f2d8c080 esp: f33bbe84
ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068
Process scsi_wq_6 (pid: 8265, ti=f33ba000 task=f5712a90 task.ti=f33ba000)
Stack: f2d8c080 00000000 00000000 f2d8c080 f2d8c0d7 f2d8c080 f8e8bdc2 f46b49e0
f2d8c114 f8e8d040 f7c4446c f704e724 f704e6c0 00000000 00000001 00000000
f46b49fc f8e8d741 f7c44438 00000000 f33bbedc 402e9267 f7c44380 ffffffed
Call Trace:
[<f8e8bdc2>] sas_discover_sata+0x42/0x80 [libsas]
[<f8e8d040>] sas_ex_discover_end_dev+0x120/0x2d0 [libsas]
[<f8e8d741>] sas_ex_discover_dev+0x2d1/0x470 [libsas]
[<402e9267>] attribute_container_device_trigger+0xa7/0xb0
[<f8e8daa3>] sas_ex_discover_devices+0x83/0xb0 [libsas]
[<f8e8e6d3>] sas_ex_level_discovery+0x43/0x70 [libsas]
[<f8e8e71b>] sas_ex_bfs_disc+0x1b/0x30 [libsas]
[<f8e8e76e>] sas_discover_root_expander+0x3e/0x80 [libsas]
[<f8e8bf40>] sas_discover_domain+0x0/0xc0 [libsas]
[<f8e8bfea>] sas_discover_domain+0xaa/0xc0 [libsas]
[<40131541>] run_workqueue+0x71/0x100
[<4013167c>] worker_thread+0xac/0x110
[<401352a0>] autoremove_wake_function+0x0/0x50
[<401352a0>] autoremove_wake_function+0x0/0x50
[<401315d0>] worker_thread+0x0/0x110
[<40134d24>] kthread+0x64/0xa0
[<40134cc0>] kthread+0x0/0xa0
[<401048b7>] kernel_thread_helper+0x7/0x10
=======================
Code: f0 83 c4 1c 5b 5e 5f 5d c3 0f 0b 8d b4 26 00 00 00 00 eb fe 8d b4 26 00
00 00 00 8d bc 27 00 00 00 00 55 57 89 c7 56 53 83 ec 08 <8b> 70 74 8b 5e 74
eb 0b 8b 43 74 31 d2 85 c0 74 13 89 c3 89 d8
EIP: [<f8e2daf9>] sas_rphy_add+0x9/0x100 [scsi_transport_sas] SS:ESP
0068:f33bbe84
let me know if you need any more information. i used latest firmware available
from Adaptec's site.
Best regards,
Krzysztof Blaszkowski
Systemy mikroprocesorowe
Storrady 1
PL71602 Szczecin, Poland
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives
2007-11-30 9:22 aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives Krzysztof Błaszkowski
@ 2007-11-30 21:33 ` Darrick J. Wong
2007-12-03 15:11 ` Krzysztof Błaszkowski
2007-12-03 16:09 ` Krzysztof Błaszkowski
0 siblings, 2 replies; 15+ messages in thread
From: Darrick J. Wong @ 2007-11-30 21:33 UTC (permalink / raw)
To: Krzysztof B??aszkowski; +Cc: linux-scsi, Vladislav Bolkhovitin, Alexis Bruemmer
On Fri, Nov 30, 2007 at 10:22:07AM +0100, Krzysztof B??aszkowski wrote:
> Hello all,
>
> I noticed this according to syslog. furthermore if aic94xx is connected to
> single sata drive only then there is no crash but device is not recognized
> too. (mysterious: "ERROR: Unidentified device type 5").
There's been a substantial amount of bugfixes (as well as SATA support)
that went into the aic94xx/libsas code between .22 and .23; could you
please give that a try?
Also, what kind of devices are attached when the system crashes? From
that stack trace it looks like the software thought there was a SATA
disk attached to an expander...?
--D
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives
2007-11-30 21:33 ` Darrick J. Wong
@ 2007-12-03 15:11 ` Krzysztof Błaszkowski
2007-12-03 16:09 ` Krzysztof Błaszkowski
1 sibling, 0 replies; 15+ messages in thread
From: Krzysztof Błaszkowski @ 2007-12-03 15:11 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-scsi, vst, Alexis Bruemmer
On Friday 30 November 2007 22:33, Darrick J. Wong wrote:
> On Fri, Nov 30, 2007 at 10:22:07AM +0100, Krzysztof B??aszkowski wrote:
> > Hello all,
> >
> > I noticed this according to syslog. furthermore if aic94xx is connected
> > to single sata drive only then there is no crash but device is not
> > recognized too. (mysterious: "ERROR: Unidentified device type 5").
>
> There's been a substantial amount of bugfixes (as well as SATA support)
> that went into the aic94xx/libsas code between .22 and .23; could you
> please give that a try?
thank you. I've tried 2.6.23.9 and it seems to work okay and indeed there were
made many changes some of them by you.
>
> Also, what kind of devices are attached when the system crashes? From
> that stack trace it looks like the software thought there was a SATA
> disk attached to an expander...?
yes, i connected aic to the expander (LSISASX28) which was loaded with 16
drives.
Best regards,
Krzysztof
>
> --D
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives
2007-11-30 21:33 ` Darrick J. Wong
2007-12-03 15:11 ` Krzysztof Błaszkowski
@ 2007-12-03 16:09 ` Krzysztof Błaszkowski
2007-12-03 19:36 ` Darrick J. Wong
1 sibling, 1 reply; 15+ messages in thread
From: Krzysztof Błaszkowski @ 2007-12-03 16:09 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-scsi, vst, Alexis Bruemmer
[-- Attachment #1: Type: text/plain, Size: 354 bytes --]
I noticed also another failure when i removed a drive. The event was not
notified by anything (ie the block device and corresponding sg were
registered) so i run dd on this truly "virtual" drive.
dd reached D state (as well as scsi_wq) . i think it shouldn't happen no
matter it was AIC failure or LSI expander failure.
>
> --D
Regards,
Krzysztof
[-- Attachment #2: hdd-removal-failure.log --]
[-- Type: text/x-log, Size: 4629 bytes --]
ata26.00: ATA-6: ST3120026AS, 3.18, max UDMA/133
ata26.00: 234441648 sectors, multi 0: LBA48
ata26.00: ata_hpa_resize 1: hpa sectors (1) is smaller than sectors (234441648)
ata26.00: configured for UDMA/133
scsi 6:0:20:0: Direct-Access ATA ST3120026AS 3.18 PQ: 0 ANSI: 5
sd 6:0:20:0: [sdb] 234441648 512-byte hardware sectors (120034 MB)
sd 6:0:20:0: [sdb] Write Protect is off
sd 6:0:20:0: [sdb] Mode Sense: 00 3a 00 00
sd 6:0:20:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 6:0:20:0: [sdb] 234441648 512-byte hardware sectors (120034 MB)
sd 6:0:20:0: [sdb] Write Protect is off
sd 6:0:20:0: [sdb] Mode Sense: 00 3a 00 00
sd 6:0:20:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sdb: unknown partition table
sd 6:0:20:0: [sdb] Attached SCSI disk
sd 6:0:20:0: Attached scsi generic sg1 type 0
sd 6:0:20:0: [sdb] Synchronizing SCSI cache
ata26: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata26: status=0x01 { Error }
ata26: error=0x04 { DriveStatusError }
sd 6:0:20:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
sd 6:0:19:0: [sda] Synchronizing SCSI cache
SysRq : Show Blocked State
task PC stack pid father
scsi_wq_6 D 40246817 0 3727 2
f3d7dc64 00000046 f72d5550 40246817 00000006 40128e47 f618b468 f73deac0
42748e00 f72d5698 f72d5550 f3d7dd30 f3d7dd34 f3d7dc80 f3d7dcb8 40402896
00000000 f72d5550 4011b8a0 00000000 00000000 4010fa7d f618b3fc f76d8070
Call Trace:
[<40246817>] elv_next_request+0xb7/0x210
[<40128e47>] lock_timer_base+0x27/0x60
[<40402896>] wait_for_completion+0x86/0xc0
[<4011b8a0>] default_wake_function+0x0/0x10
[<4010fa7d>] native_smp_send_reschedule+0x1d/0x30
[<4011b8a0>] default_wake_function+0x0/0x10
[<4024a511>] blk_execute_rq+0xa1/0xe0
[<4024a770>] blk_end_sync_rq+0x0/0x30
[<4013426b>] autoremove_wake_function+0x1b/0x50
[<4011b8e7>] __wake_up_common+0x37/0x70
[<403067a3>] scsi_execute+0xe3/0x110
[<40306845>] scsi_execute_req+0x75/0xb0
[<4031a860>] sd_sync_cache+0x70/0xb0
[<40258ccf>] kobject_get+0xf/0x20
[<4031ce34>] sd_shutdown+0x64/0x140
[<4031cbe2>] sd_remove+0x32/0x70
[<402e15c4>] __device_release_driver+0x94/0xb0
[<402e15fe>] device_release_driver+0x1e/0x40
[<402e0869>] bus_remove_device+0x59/0x80
[<402dee33>] device_del+0x53/0x2c0
[<4030bed1>] __scsi_remove_device+0x51/0x90
[<4030bf2f>] scsi_remove_device+0x1f/0x30
[<4030bfcf>] __scsi_remove_target+0x8f/0xc0
[<4030c000>] __remove_child+0x0/0x20
[<4030c018>] __remove_child+0x18/0x20
[<402df0f2>] device_for_each_child+0x22/0x40
[<4030c05e>] scsi_remove_target+0x3e/0x50
[<f8d82f88>] sas_rphy_remove+0x58/0x80 [scsi_transport_sas]
[<f8d82f28>] sas_rphy_delete+0x8/0x10 [scsi_transport_sas]
[<f8dbb75e>] sas_unregister_dev+0x8e/0xa0 [libsas]
[<f8dbe62f>] sas_unregister_devs_sas_addr+0x11f/0x130 [libsas]
[<f8dbe916>] sas_rediscover_dev+0x116/0x150 [libsas]
[<f8dbea02>] sas_rediscover+0xb2/0xe0 [libsas]
[<f8dbb880>] sas_revalidate_domain+0x0/0x50 [libsas]
[<f8dbea61>] sas_ex_revalidate_domain+0x31/0x70 [libsas]
[<40130511>] run_workqueue+0x71/0x100
[<4013061f>] worker_thread+0x7f/0xd0
[<40134250>] autoremove_wake_function+0x0/0x50
[<4040254a>] schedule+0x21a/0x4e0
[<40134250>] autoremove_wake_function+0x0/0x50
[<401305a0>] worker_thread+0x0/0xd0
[<40133ca4>] kthread+0x64/0xa0
[<40133c40>] kthread+0x0/0xa0
[<40104887>] kernel_thread_helper+0x7/0x10
=======================
dd D 40249148 0 18935 16194
f1fc7d88 00000086 f7d41aa0 40249148 00000000 00000000 4237b300 f3c52900
4273fe00 f7d41be8 f7d41aa0 4273fe00 f1fc7de4 42708a64 f1fc7d94 40402eed
f1fc7ddc 00000000 401517c5 4040318f 40151780 401342a0 f1fc7ddc f1fc7dd8
Call Trace:
[<40249148>] blk_backing_dev_unplug+0x48/0xa0
[<40402eed>] io_schedule+0x1d/0x30
[<401517c5>] sync_page+0x45/0x50
[<4040318f>] __wait_on_bit_lock+0x3f/0x70
[<40151780>] sync_page+0x0/0x50
[<401342a0>] wake_bit_function+0x0/0x60
[<401520ca>] __lock_page+0x9a/0xb0
[<401342a0>] wake_bit_function+0x0/0x60
[<401342a0>] wake_bit_function+0x0/0x60
[<4015279e>] do_generic_mapping_read+0x22e/0x4b0
[<40152da0>] generic_file_aio_read+0x1c0/0x1f0
[<40152a20>] file_read_actor+0x0/0x110
[<40172d6d>] do_sync_read+0xbd/0x110
[<40134250>] autoremove_wake_function+0x0/0x50
[<40116455>] do_page_fault+0x1b5/0x630
[<4012cd5f>] sys_rt_sigaction+0x5f/0xb0
[<40172e83>] vfs_read+0xc3/0x150
[<401731c1>] sys_read+0x41/0x70
[<40103c36>] sysenter_past_esp+0x5f/0x85
[<40400000>] clip_setup+0x20/0x50
=======================
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives
2007-12-03 16:09 ` Krzysztof Błaszkowski
@ 2007-12-03 19:36 ` Darrick J. Wong
2007-12-03 19:43 ` Jeff Garzik
2007-12-03 20:06 ` Krzysztof Błaszkowski
0 siblings, 2 replies; 15+ messages in thread
From: Darrick J. Wong @ 2007-12-03 19:36 UTC (permalink / raw)
To: Krzysztof B??aszkowski; +Cc: linux-scsi, vst, Alexis Bruemmer
On Mon, Dec 03, 2007 at 05:09:54PM +0100, Krzysztof B??aszkowski wrote:
>
> I noticed also another failure when i removed a drive. The event was not
> notified by anything (ie the block device and corresponding sg were
> registered) so i run dd on this truly "virtual" drive.
>
> dd reached D state (as well as scsi_wq) . i think it shouldn't happen no
> matter it was AIC failure or LSI expander failure.
"It's wireless!" ;)
Seriously, though, it's a good idea to tell the kernel that you're
about to unplug a disk before actually doing it:
echo 1 > /sys/block/sdX/device/delete
This way, the kernel can tell the disk to flush its caches long before
power actually gets removed. Otherwise, the device removal code can
get hung up just like you observed, and whatever's in the write cache
may or may not actually get written to the media.
--D
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives
2007-12-03 19:36 ` Darrick J. Wong
@ 2007-12-03 19:43 ` Jeff Garzik
2007-12-03 21:31 ` Darrick J. Wong
2007-12-03 20:06 ` Krzysztof Błaszkowski
1 sibling, 1 reply; 15+ messages in thread
From: Jeff Garzik @ 2007-12-03 19:43 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Krzysztof B??aszkowski, linux-scsi, vst, Alexis Bruemmer
Darrick J. Wong wrote:
> On Mon, Dec 03, 2007 at 05:09:54PM +0100, Krzysztof B??aszkowski wrote:
>> I noticed also another failure when i removed a drive. The event was not
>> notified by anything (ie the block device and corresponding sg were
>> registered) so i run dd on this truly "virtual" drive.
>>
>> dd reached D state (as well as scsi_wq) . i think it shouldn't happen no
>> matter it was AIC failure or LSI expander failure.
>
> "It's wireless!" ;)
>
> Seriously, though, it's a good idea to tell the kernel that you're
> about to unplug a disk before actually doing it:
>
> echo 1 > /sys/block/sdX/device/delete
>
> This way, the kernel can tell the disk to flush its caches long before
> power actually gets removed. Otherwise, the device removal code can
> get hung up just like you observed, and whatever's in the write cache
> may or may not actually get written to the media.
What you say is quite true about write cache -- you can clearly lose
some data by hot-unplugging a device. And there's nothing we can do
about that.
But what do you mean by "device removal code can get hung up"? That
sounds like a bug we should fix.
Jeff
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives
2007-12-03 19:36 ` Darrick J. Wong
2007-12-03 19:43 ` Jeff Garzik
@ 2007-12-03 20:06 ` Krzysztof Błaszkowski
2007-12-04 22:35 ` [PATCH] libsas: Don't issue commands to devices that have been hot-removed Darrick J. Wong
1 sibling, 1 reply; 15+ messages in thread
From: Krzysztof Błaszkowski @ 2007-12-03 20:06 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-scsi, vst, Alexis Bruemmer
Hi Darrick,
On Monday 03 December 2007 20:36, Darrick J. Wong wrote:
> On Mon, Dec 03, 2007 at 05:09:54PM +0100, Krzysztof B??aszkowski wrote:
> > I noticed also another failure when i removed a drive. The event was not
> > notified by anything (ie the block device and corresponding sg were
> > registered) so i run dd on this truly "virtual" drive.
> >
> > dd reached D state (as well as scsi_wq) . i think it shouldn't happen no
> > matter it was AIC failure or LSI expander failure.
>
> "It's wireless!" ;)
yep :) and energy from positive thinking spins disk's plates ;)
>
> Seriously, though, it's a good idea to tell the kernel that you're
> about to unplug a disk before actually doing it:
>
> echo 1 > /sys/block/sdX/device/delete
>
> This way, the kernel can tell the disk to flush its caches long before
> power actually gets removed. Otherwise, the device removal code can
> get hung up just like you observed, and whatever's in the write cache
> may or may not actually get written to the media.
>
imagine just raining Monday and someone who put hand on the drive thus he had
to reboot whole box.
Thanks,
Krzysztof
> --D
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives
2007-12-03 19:43 ` Jeff Garzik
@ 2007-12-03 21:31 ` Darrick J. Wong
0 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2007-12-03 21:31 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Krzysztof B??aszkowski, linux-scsi, vst, Alexis Bruemmer
On Mon, Dec 03, 2007 at 02:43:09PM -0500, Jeff Garzik wrote:
> But what do you mean by "device removal code can get hung up"? That sounds
> like a bug we should fix.
At the moment, libsas' sas_rphy_remove function doesn't distinguish between
removing a device before or after the disk has been disconnected.
Hence, sd_shutdown tries to tell the disk to flush the write cache, even
in the case that the disk is already gone. Maybe the solution is to
modify aic94xx to remove the device's DDB registration prior to sending
the "device gone" event to libsas so that all subsequent commands bounce
with "no such device" instead of going out to lunch.
(I'll look into this later, as I myself am going out to lunch right now.)
--D
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] libsas: Don't issue commands to devices that have been hot-removed.
2007-12-03 20:06 ` Krzysztof Błaszkowski
@ 2007-12-04 22:35 ` Darrick J. Wong
2007-12-04 22:48 ` Jeff Garzik
0 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2007-12-04 22:35 UTC (permalink / raw)
To: Krzysztof Błaszkowski; +Cc: linux-scsi, vst, Alexis Bruemmer
Hrm... does this patch help? You'll get a bunch of ATA/SAS disk errors
printed to the screen if you yank the disk, but at least libsas won't
get stuck waiting for the cache-flush commands to time out.
---
sd will get hung up issuing commands to flush write cache if a SAS device
is unplugged without warning. Change libsas to reject commands to domain
devices that have already gone away.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
drivers/scsi/libsas/sas_ata.c | 4 ++++
drivers/scsi/libsas/sas_expander.c | 3 +++
drivers/scsi/libsas/sas_port.c | 2 ++
drivers/scsi/libsas/sas_scsi_host.c | 7 +++++++
include/scsi/libsas.h | 1 +
5 files changed, 17 insertions(+), 0 deletions(-)
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 0829b55..f5e5213 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -161,6 +161,10 @@ static unsigned int sas_ata_qc_issue(struct ata_queued_cmd *qc)
unsigned int num = 0;
unsigned int xfer = 0;
+ /* If the device fell off, no sense in issuing commands */
+ if (dev->gone)
+ return AC_ERR_SYSTEM;
+
task = sas_alloc_task(GFP_ATOMIC);
if (!task)
return AC_ERR_SYSTEM;
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 27674fe..4ba4d2a 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1680,6 +1680,7 @@ static void sas_unregister_ex_tree(struct domain_device *dev)
struct domain_device *child, *n;
list_for_each_entry_safe(child, n, &ex->children, siblings) {
+ child->gone = 1;
if (child->dev_type == EDGE_DEV ||
child->dev_type == FANOUT_DEV)
sas_unregister_ex_tree(child);
@@ -1699,6 +1700,7 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent,
list_for_each_entry_safe(child, n, &ex_dev->children, siblings) {
if (SAS_ADDR(child->sas_addr) ==
SAS_ADDR(phy->attached_sas_addr)) {
+ child->gone = 1;
if (child->dev_type == EDGE_DEV ||
child->dev_type == FANOUT_DEV)
sas_unregister_ex_tree(child);
@@ -1707,6 +1709,7 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent,
break;
}
}
+ parent->gone = 1;
sas_disable_routing(parent, phy->attached_sas_addr);
memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE);
sas_port_delete_phy(phy->port, phy->phy);
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index b6f0243..2e82097 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -144,6 +144,8 @@ void sas_deform_port(struct asd_sas_phy *phy)
port->port_dev->pathways--;
if (port->num_phys == 1) {
+ if (port->port_dev)
+ port->port_dev->gone = 1;
sas_unregister_domain_devices(port);
sas_port_delete(port->port);
port->port = NULL;
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index c29ba47..61d2679 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -228,6 +228,13 @@ int sas_queuecommand(struct scsi_cmnd *cmd,
goto out;
}
+ /* If the device fell off, no sense in issuing commands */
+ if (dev->gone) {
+ cmd->result = DID_BAD_TARGET << 16;
+ scsi_done(cmd);
+ goto out;
+ }
+
res = -ENOMEM;
task = sas_create_task(cmd, dev, GFP_ATOMIC);
if (!task)
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 8ad7465..73c5b15 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -207,6 +207,7 @@ struct domain_device {
};
void *lldd_dev;
+ int gone;
};
struct sas_discovery_event {
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] libsas: Don't issue commands to devices that have been hot-removed.
2007-12-04 22:35 ` [PATCH] libsas: Don't issue commands to devices that have been hot-removed Darrick J. Wong
@ 2007-12-04 22:48 ` Jeff Garzik
2007-12-04 23:17 ` Darrick J. Wong
0 siblings, 1 reply; 15+ messages in thread
From: Jeff Garzik @ 2007-12-04 22:48 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Krzysztof Błaszkowski, linux-scsi, vst, Alexis Bruemmer
Darrick J. Wong wrote:
> Hrm... does this patch help? You'll get a bunch of ATA/SAS disk errors
> printed to the screen if you yank the disk, but at least libsas won't
> get stuck waiting for the cache-flush commands to time out.
> ---
> sd will get hung up issuing commands to flush write cache if a SAS device
> is unplugged without warning. Change libsas to reject commands to domain
> devices that have already gone away.
>
> Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
> ---
>
> drivers/scsi/libsas/sas_ata.c | 4 ++++
> drivers/scsi/libsas/sas_expander.c | 3 +++
> drivers/scsi/libsas/sas_port.c | 2 ++
> drivers/scsi/libsas/sas_scsi_host.c | 7 +++++++
> include/scsi/libsas.h | 1 +
> 5 files changed, 17 insertions(+), 0 deletions(-)
Seems sane...
> diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
> index 0829b55..f5e5213 100644
> --- a/drivers/scsi/libsas/sas_ata.c
> +++ b/drivers/scsi/libsas/sas_ata.c
> @@ -161,6 +161,10 @@ static unsigned int sas_ata_qc_issue(struct ata_queued_cmd *qc)
> unsigned int num = 0;
> unsigned int xfer = 0;
>
> + /* If the device fell off, no sense in issuing commands */
> + if (dev->gone)
> + return AC_ERR_SYSTEM;
> +
> task = sas_alloc_task(GFP_ATOMIC);
> if (!task)
> return AC_ERR_SYSTEM;
As an aside, issues like this really really imply a need to move libsas
away from the old libata EH stuff (like brking did with ipr, in patches).
Jeff
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] libsas: Don't issue commands to devices that have been hot-removed.
2007-12-04 22:48 ` Jeff Garzik
@ 2007-12-04 23:17 ` Darrick J. Wong
2007-12-04 23:40 ` Jeff Garzik
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Darrick J. Wong @ 2007-12-04 23:17 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Krzysztof Błaszkowski, linux-scsi, vst, Alexis Bruemmer
On Tue, Dec 04, 2007 at 05:48:33PM -0500, Jeff Garzik wrote:
> As an aside, issues like this really really imply a need to move libsas
> away from the old libata EH stuff (like brking did with ipr, in patches).
Hm... does the new libata EH handle the case of "device was
unplugged, don't bother trying to send any more commands"?
In general, I agree that sas-ata should adopt the new EH.
Unfortunately, I believe the old way of sas-ata configuring ATA ports is
somehow not compatible with the new EH stuff and causes a crash during
the device probe with my patch to move sas-ata to the new EH. If I
apply the patch that migrates sas-ata to use brking's latest ata-sas
configuration mechanism (the one that creates real ata_hosts), I see
(a) lots and lots of ATA hosts getting created (one per ATA port;
possibly undesirable if you've a SAS topology with a lot of SATA disks)
and (b) NCQ disks don't seem to work if you unplug the disk and plug
it back in (unless NCQ is disabled entirely). Jeff, by any chance have
you tried plugging SATA devices into your SAS controllers?
James Bottomley wondered if it would be easier to have sas-ata call only
into the parts of libata that convert SCSI commands to ATA taskfiles,
though I'm unsure how many wormy cans that would open.
--D
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] libsas: Don't issue commands to devices that have been hot-removed.
2007-12-04 23:17 ` Darrick J. Wong
@ 2007-12-04 23:40 ` Jeff Garzik
2007-12-06 16:55 ` Brian King
2008-02-25 23:39 ` Jeff Garzik
2 siblings, 0 replies; 15+ messages in thread
From: Jeff Garzik @ 2007-12-04 23:40 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Krzysztof Błaszkowski, linux-scsi, vst, Alexis Bruemmer
Darrick J. Wong wrote:
> On Tue, Dec 04, 2007 at 05:48:33PM -0500, Jeff Garzik wrote:
>
>> As an aside, issues like this really really imply a need to move libsas
>> away from the old libata EH stuff (like brking did with ipr, in patches).
>
> Hm... does the new libata EH handle the case of "device was
> unplugged, don't bother trying to send any more commands"?
>
> In general, I agree that sas-ata should adopt the new EH.
> Unfortunately, I believe the old way of sas-ata configuring ATA ports is
> somehow not compatible with the new EH stuff and causes a crash during
> the device probe with my patch to move sas-ata to the new EH. If I
> apply the patch that migrates sas-ata to use brking's latest ata-sas
> configuration mechanism (the one that creates real ata_hosts), I see
> (a) lots and lots of ATA hosts getting created (one per ATA port;
> possibly undesirable if you've a SAS topology with a lot of SATA disks)
> and (b) NCQ disks don't seem to work if you unplug the disk and plug
> it back in (unless NCQ is disabled entirely). Jeff, by any chance have
> you tried plugging SATA devices into your SAS controllers?
aic94xx yes, bcm and mv no.
Will take a look though...
> James Bottomley wondered if it would be easier to have sas-ata call only
> into the parts of libata that convert SCSI commands to ATA taskfiles,
> though I'm unsure how many wormy cans that would open.
You want more than that.
You want to make sure libata is the place for knowledge about weird ATA
devices, SATA quirks, ATA device error handling (to be distinguished
from ATA /link/ error handling), and other areas.
That stuff shouldn't be duplicated, and you /really/ do not want to
re-learn all those lessons all over again ;-)
Jeff
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] libsas: Don't issue commands to devices that have been hot-removed.
2007-12-04 23:17 ` Darrick J. Wong
2007-12-04 23:40 ` Jeff Garzik
@ 2007-12-06 16:55 ` Brian King
2008-02-25 23:39 ` Jeff Garzik
2 siblings, 0 replies; 15+ messages in thread
From: Brian King @ 2007-12-06 16:55 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Jeff Garzik, Krzysztof B?aszkowski, linux-scsi, vst, Alexis Bruemmer
Darrick J. Wong wrote:
> In general, I agree that sas-ata should adopt the new EH.
> Unfortunately, I believe the old way of sas-ata configuring ATA ports is
> somehow not compatible with the new EH stuff and causes a crash during
> the device probe with my patch to move sas-ata to the new EH. If I
> apply the patch that migrates sas-ata to use brking's latest ata-sas
> configuration mechanism (the one that creates real ata_hosts), I see
> (a) lots and lots of ATA hosts getting created (one per ATA port;
> possibly undesirable if you've a SAS topology with a lot of SATA disks)
The new libata EH ends up spending more time in the error handling thread
than the old code did. One of the reasons having multiple ATA/SCSI hosts
is a good thing is that is the granularity of error handling, so it
prevents stalling all the other devices under that SAS HBA while we are
hitting errors on an ATAPI SATA device, for example.
Arguably, SATA users of libata already have one SCSI host per ATA port,
so my SAS patches really just bring SAS in line with that design...
-Brian
--
Brian King
Linux on Power Virtualization
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] libsas: Don't issue commands to devices that have been hot-removed.
2007-12-04 23:17 ` Darrick J. Wong
2007-12-04 23:40 ` Jeff Garzik
2007-12-06 16:55 ` Brian King
@ 2008-02-25 23:39 ` Jeff Garzik
2 siblings, 0 replies; 15+ messages in thread
From: Jeff Garzik @ 2008-02-25 23:39 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Jeff Garzik, Krzysztof Błaszkowski, linux-scsi, vst,
Alexis Bruemmer
(digging through old email)
Darrick J. Wong wrote:
> On Tue, Dec 04, 2007 at 05:48:33PM -0500, Jeff Garzik wrote:
>
>> As an aside, issues like this really really imply a need to move libsas
>> away from the old libata EH stuff (like brking did with ipr, in patches).
>
> Hm... does the new libata EH handle the case of "device was
> unplugged, don't bother trying to send any more commands"?
Yes, most certainly :) We wouldn't have hotplug support without that...
> In general, I agree that sas-ata should adopt the new EH.
> Unfortunately, I believe the old way of sas-ata configuring ATA ports is
> somehow not compatible with the new EH stuff and causes a crash during
> the device probe with my patch to move sas-ata to the new EH. If I
> apply the patch that migrates sas-ata to use brking's latest ata-sas
> configuration mechanism (the one that creates real ata_hosts), I see
> (a) lots and lots of ATA hosts getting created (one per ATA port;
> possibly undesirable if you've a SAS topology with a lot of SATA disks)
> and (b) NCQ disks don't seem to work if you unplug the disk and plug
> it back in (unless NCQ is disabled entirely). Jeff, by any chance have
> you tried plugging SATA devices into your SAS controllers?
Just tested mvsas here...
> James Bottomley wondered if it would be easier to have sas-ata call only
> into the parts of libata that convert SCSI commands to ATA taskfiles,
> though I'm unsure how many wormy cans that would open.
Like Brian K noted, libata-EH is heavily involved in "anything not
hotpath read/write", including but not limited to: PMP, hotplug, device
probing, device revalidation, explicit sequencing of ATA commands during
initialization (critical for getting many ATA devices working)
You don't want to reinvent or duplicate all those ATA device
initialization/revalidation quirks.
Jeff
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] libsas: Don't issue commands to devices that have been hot-removed
@ 2010-10-01 20:55 Dan Williams
0 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2010-10-01 20:55 UTC (permalink / raw)
To: james.bottomley
Cc: Haipao Fan, linux-scsi, Jeff Garzik, Maciej Trela,
Patrick Thomson, Jeff Skirvin, Brian King, Darrick J. Wong
From: Darrick J. Wong <djwong@us.ibm.com>
sd will get hung up issuing commands to flush write cache if a SAS
device behind the expander is unplugged without warning. Change libsas
to reject commands to domain devices that have already gone away.
[maciej.trela@intel.com: removed setting ->gone in sas_deform_port() to
permit sync cache commands at module removal]
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
linux-scsi-reference: <20071204223516.GA6767@tree.beaverton.ibm.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Patrick Thomson <patrick.s.thomson@intel.com>
Cc: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Tested-by: Haipao Fan <haipao.fan@intel.com>
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/scsi/libsas/sas_ata.c | 4 ++++
drivers/scsi/libsas/sas_expander.c | 3 +++
drivers/scsi/libsas/sas_scsi_host.c | 7 +++++++
include/scsi/libsas.h | 1 +
4 files changed, 15 insertions(+), 0 deletions(-)
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 042153c..da2e740 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -162,6 +162,10 @@ static unsigned int sas_ata_qc_issue(struct ata_queued_cmd *qc)
unsigned int xfer = 0;
unsigned int si;
+ /* If the device fell off, no sense in issuing commands */
+ if (dev->gone)
+ return AC_ERR_SYSTEM;
+
task = sas_alloc_task(GFP_ATOMIC);
if (!task)
return AC_ERR_SYSTEM;
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 83dd507..61d81f8 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1724,6 +1724,7 @@ static void sas_unregister_ex_tree(struct domain_device *dev)
struct domain_device *child, *n;
list_for_each_entry_safe(child, n, &ex->children, siblings) {
+ child->gone = 1;
if (child->dev_type == EDGE_DEV ||
child->dev_type == FANOUT_DEV)
sas_unregister_ex_tree(child);
@@ -1744,6 +1745,7 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent,
&ex_dev->children, siblings) {
if (SAS_ADDR(child->sas_addr) ==
SAS_ADDR(phy->attached_sas_addr)) {
+ child->gone = 1;
if (child->dev_type == EDGE_DEV ||
child->dev_type == FANOUT_DEV)
sas_unregister_ex_tree(child);
@@ -1752,6 +1754,7 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent,
break;
}
}
+ parent->gone = 1;
sas_disable_routing(parent, phy->attached_sas_addr);
}
memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE);
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index f0cfba9..1787bd2 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -228,6 +228,13 @@ int sas_queuecommand(struct scsi_cmnd *cmd,
goto out;
}
+ /* If the device fell off, no sense in issuing commands */
+ if (dev->gone) {
+ cmd->result = DID_BAD_TARGET << 16;
+ scsi_done(cmd);
+ goto out;
+ }
+
res = -ENOMEM;
task = sas_create_task(cmd, dev, GFP_ATOMIC);
if (!task)
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index d06e13b..3dec194 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -205,6 +205,7 @@ struct domain_device {
};
void *lldd_dev;
+ int gone;
};
struct sas_discovery_event {
^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-10-01 20:54 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-30 9:22 aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives Krzysztof Błaszkowski
2007-11-30 21:33 ` Darrick J. Wong
2007-12-03 15:11 ` Krzysztof Błaszkowski
2007-12-03 16:09 ` Krzysztof Błaszkowski
2007-12-03 19:36 ` Darrick J. Wong
2007-12-03 19:43 ` Jeff Garzik
2007-12-03 21:31 ` Darrick J. Wong
2007-12-03 20:06 ` Krzysztof Błaszkowski
2007-12-04 22:35 ` [PATCH] libsas: Don't issue commands to devices that have been hot-removed Darrick J. Wong
2007-12-04 22:48 ` Jeff Garzik
2007-12-04 23:17 ` Darrick J. Wong
2007-12-04 23:40 ` Jeff Garzik
2007-12-06 16:55 ` Brian King
2008-02-25 23:39 ` Jeff Garzik
2010-10-01 20:55 Dan Williams
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.