* aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives @ 2007-11-30 9:22 Krzysztof Błaszkowski 2007-11-30 21:33 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Krzysztof Błaszkowski @ 2007-11-30 9:22 UTC (permalink / raw) To: linux-scsi; +Cc: Vladislav Bolkhovitin Hello all, I noticed this according to syslog. furthermore if aic94xx is connected to single sata drive only then there is no crash but device is not recognized too. (mysterious: "ERROR: Unidentified device type 5"). A crash recorded in syslog: aic94xx: Adaptec aic94xx SAS/SATA driver version 1.0.3 loaded ACPI: PCI Interrupt 0000:04:02.0[A] -> GSI 16 (level, low) -> IRQ 16 aic94xx: found Adaptec AIC-9410W SAS/SATA Host Adapter, device 0000:04:02.0 scsi6 : aic94xx PM: Adding info for No Bus:host6 PM: Adding info for No Bus:0000:04:02.0 PM: Removing info for No Bus:0000:04:02.0 aic94xx: Found sequencer Firmware version 1.1 (V30) aic94xx: device 0000:04:02.0: SAS addr 500304800004ce20, PCBA SN ORG, 8 phys, 8 enabled phys, flash present, BIOS build 1822 PM: Adding info for No Bus:phy-6:0 <snip> BUG: unable to handle kernel NULL pointer dereference at virtual address 00000074 printing eip: f8e2daf9 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: aic94xx firmware_class libsas scsi_transport_sas nfsd exportfs nvram speedstep_lib freq_table thermal processor fan button battery edd ac ipv6 evdev joydev sr_mod ide_cd cdrom e1000 ehci_hcd i2c_i801 uhci_hcd rng_core dm_mod usbcore CPU: 0 EIP: 0060:[<f8e2daf9>] Not tainted VLI EFLAGS: 00010286 (2.6.22.8 #6) EIP is at sas_rphy_add+0x9/0x100 [scsi_transport_sas] eax: 00000000 ebx: 00000000 ecx: 00000004 edx: 00000282 esi: f2d8c080 edi: 00000000 ebp: f2d8c080 esp: f33bbe84 ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Process scsi_wq_6 (pid: 8265, ti=f33ba000 task=f5712a90 task.ti=f33ba000) Stack: f2d8c080 00000000 00000000 f2d8c080 f2d8c0d7 f2d8c080 f8e8bdc2 f46b49e0 f2d8c114 f8e8d040 f7c4446c f704e724 f704e6c0 00000000 00000001 00000000 f46b49fc f8e8d741 f7c44438 00000000 f33bbedc 402e9267 f7c44380 ffffffed Call Trace: [<f8e8bdc2>] sas_discover_sata+0x42/0x80 [libsas] [<f8e8d040>] sas_ex_discover_end_dev+0x120/0x2d0 [libsas] [<f8e8d741>] sas_ex_discover_dev+0x2d1/0x470 [libsas] [<402e9267>] attribute_container_device_trigger+0xa7/0xb0 [<f8e8daa3>] sas_ex_discover_devices+0x83/0xb0 [libsas] [<f8e8e6d3>] sas_ex_level_discovery+0x43/0x70 [libsas] [<f8e8e71b>] sas_ex_bfs_disc+0x1b/0x30 [libsas] [<f8e8e76e>] sas_discover_root_expander+0x3e/0x80 [libsas] [<f8e8bf40>] sas_discover_domain+0x0/0xc0 [libsas] [<f8e8bfea>] sas_discover_domain+0xaa/0xc0 [libsas] [<40131541>] run_workqueue+0x71/0x100 [<4013167c>] worker_thread+0xac/0x110 [<401352a0>] autoremove_wake_function+0x0/0x50 [<401352a0>] autoremove_wake_function+0x0/0x50 [<401315d0>] worker_thread+0x0/0x110 [<40134d24>] kthread+0x64/0xa0 [<40134cc0>] kthread+0x0/0xa0 [<401048b7>] kernel_thread_helper+0x7/0x10 ======================= Code: f0 83 c4 1c 5b 5e 5f 5d c3 0f 0b 8d b4 26 00 00 00 00 eb fe 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 57 89 c7 56 53 83 ec 08 <8b> 70 74 8b 5e 74 eb 0b 8b 43 74 31 d2 85 c0 74 13 89 c3 89 d8 EIP: [<f8e2daf9>] sas_rphy_add+0x9/0x100 [scsi_transport_sas] SS:ESP 0068:f33bbe84 let me know if you need any more information. i used latest firmware available from Adaptec's site. Best regards, Krzysztof Blaszkowski Systemy mikroprocesorowe Storrady 1 PL71602 Szczecin, Poland - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives 2007-11-30 9:22 aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives Krzysztof Błaszkowski @ 2007-11-30 21:33 ` Darrick J. Wong 2007-12-03 15:11 ` Krzysztof Błaszkowski 2007-12-03 16:09 ` Krzysztof Błaszkowski 0 siblings, 2 replies; 14+ messages in thread From: Darrick J. Wong @ 2007-11-30 21:33 UTC (permalink / raw) To: Krzysztof B??aszkowski; +Cc: linux-scsi, Vladislav Bolkhovitin, Alexis Bruemmer On Fri, Nov 30, 2007 at 10:22:07AM +0100, Krzysztof B??aszkowski wrote: > Hello all, > > I noticed this according to syslog. furthermore if aic94xx is connected to > single sata drive only then there is no crash but device is not recognized > too. (mysterious: "ERROR: Unidentified device type 5"). There's been a substantial amount of bugfixes (as well as SATA support) that went into the aic94xx/libsas code between .22 and .23; could you please give that a try? Also, what kind of devices are attached when the system crashes? From that stack trace it looks like the software thought there was a SATA disk attached to an expander...? --D ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives 2007-11-30 21:33 ` Darrick J. Wong @ 2007-12-03 15:11 ` Krzysztof Błaszkowski 2007-12-03 16:09 ` Krzysztof Błaszkowski 1 sibling, 0 replies; 14+ messages in thread From: Krzysztof Błaszkowski @ 2007-12-03 15:11 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-scsi, vst, Alexis Bruemmer On Friday 30 November 2007 22:33, Darrick J. Wong wrote: > On Fri, Nov 30, 2007 at 10:22:07AM +0100, Krzysztof B??aszkowski wrote: > > Hello all, > > > > I noticed this according to syslog. furthermore if aic94xx is connected > > to single sata drive only then there is no crash but device is not > > recognized too. (mysterious: "ERROR: Unidentified device type 5"). > > There's been a substantial amount of bugfixes (as well as SATA support) > that went into the aic94xx/libsas code between .22 and .23; could you > please give that a try? thank you. I've tried 2.6.23.9 and it seems to work okay and indeed there were made many changes some of them by you. > > Also, what kind of devices are attached when the system crashes? From > that stack trace it looks like the software thought there was a SATA > disk attached to an expander...? yes, i connected aic to the expander (LSISASX28) which was loaded with 16 drives. Best regards, Krzysztof > > --D ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives 2007-11-30 21:33 ` Darrick J. Wong 2007-12-03 15:11 ` Krzysztof Błaszkowski @ 2007-12-03 16:09 ` Krzysztof Błaszkowski 2007-12-03 19:36 ` Darrick J. Wong 1 sibling, 1 reply; 14+ messages in thread From: Krzysztof Błaszkowski @ 2007-12-03 16:09 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-scsi, vst, Alexis Bruemmer [-- Attachment #1: Type: text/plain, Size: 354 bytes --] I noticed also another failure when i removed a drive. The event was not notified by anything (ie the block device and corresponding sg were registered) so i run dd on this truly "virtual" drive. dd reached D state (as well as scsi_wq) . i think it shouldn't happen no matter it was AIC failure or LSI expander failure. > > --D Regards, Krzysztof [-- Attachment #2: hdd-removal-failure.log --] [-- Type: text/x-log, Size: 4629 bytes --] ata26.00: ATA-6: ST3120026AS, 3.18, max UDMA/133 ata26.00: 234441648 sectors, multi 0: LBA48 ata26.00: ata_hpa_resize 1: hpa sectors (1) is smaller than sectors (234441648) ata26.00: configured for UDMA/133 scsi 6:0:20:0: Direct-Access ATA ST3120026AS 3.18 PQ: 0 ANSI: 5 sd 6:0:20:0: [sdb] 234441648 512-byte hardware sectors (120034 MB) sd 6:0:20:0: [sdb] Write Protect is off sd 6:0:20:0: [sdb] Mode Sense: 00 3a 00 00 sd 6:0:20:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 6:0:20:0: [sdb] 234441648 512-byte hardware sectors (120034 MB) sd 6:0:20:0: [sdb] Write Protect is off sd 6:0:20:0: [sdb] Mode Sense: 00 3a 00 00 sd 6:0:20:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdb: unknown partition table sd 6:0:20:0: [sdb] Attached SCSI disk sd 6:0:20:0: Attached scsi generic sg1 type 0 sd 6:0:20:0: [sdb] Synchronizing SCSI cache ata26: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata26: status=0x01 { Error } ata26: error=0x04 { DriveStatusError } sd 6:0:20:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK sd 6:0:19:0: [sda] Synchronizing SCSI cache SysRq : Show Blocked State task PC stack pid father scsi_wq_6 D 40246817 0 3727 2 f3d7dc64 00000046 f72d5550 40246817 00000006 40128e47 f618b468 f73deac0 42748e00 f72d5698 f72d5550 f3d7dd30 f3d7dd34 f3d7dc80 f3d7dcb8 40402896 00000000 f72d5550 4011b8a0 00000000 00000000 4010fa7d f618b3fc f76d8070 Call Trace: [<40246817>] elv_next_request+0xb7/0x210 [<40128e47>] lock_timer_base+0x27/0x60 [<40402896>] wait_for_completion+0x86/0xc0 [<4011b8a0>] default_wake_function+0x0/0x10 [<4010fa7d>] native_smp_send_reschedule+0x1d/0x30 [<4011b8a0>] default_wake_function+0x0/0x10 [<4024a511>] blk_execute_rq+0xa1/0xe0 [<4024a770>] blk_end_sync_rq+0x0/0x30 [<4013426b>] autoremove_wake_function+0x1b/0x50 [<4011b8e7>] __wake_up_common+0x37/0x70 [<403067a3>] scsi_execute+0xe3/0x110 [<40306845>] scsi_execute_req+0x75/0xb0 [<4031a860>] sd_sync_cache+0x70/0xb0 [<40258ccf>] kobject_get+0xf/0x20 [<4031ce34>] sd_shutdown+0x64/0x140 [<4031cbe2>] sd_remove+0x32/0x70 [<402e15c4>] __device_release_driver+0x94/0xb0 [<402e15fe>] device_release_driver+0x1e/0x40 [<402e0869>] bus_remove_device+0x59/0x80 [<402dee33>] device_del+0x53/0x2c0 [<4030bed1>] __scsi_remove_device+0x51/0x90 [<4030bf2f>] scsi_remove_device+0x1f/0x30 [<4030bfcf>] __scsi_remove_target+0x8f/0xc0 [<4030c000>] __remove_child+0x0/0x20 [<4030c018>] __remove_child+0x18/0x20 [<402df0f2>] device_for_each_child+0x22/0x40 [<4030c05e>] scsi_remove_target+0x3e/0x50 [<f8d82f88>] sas_rphy_remove+0x58/0x80 [scsi_transport_sas] [<f8d82f28>] sas_rphy_delete+0x8/0x10 [scsi_transport_sas] [<f8dbb75e>] sas_unregister_dev+0x8e/0xa0 [libsas] [<f8dbe62f>] sas_unregister_devs_sas_addr+0x11f/0x130 [libsas] [<f8dbe916>] sas_rediscover_dev+0x116/0x150 [libsas] [<f8dbea02>] sas_rediscover+0xb2/0xe0 [libsas] [<f8dbb880>] sas_revalidate_domain+0x0/0x50 [libsas] [<f8dbea61>] sas_ex_revalidate_domain+0x31/0x70 [libsas] [<40130511>] run_workqueue+0x71/0x100 [<4013061f>] worker_thread+0x7f/0xd0 [<40134250>] autoremove_wake_function+0x0/0x50 [<4040254a>] schedule+0x21a/0x4e0 [<40134250>] autoremove_wake_function+0x0/0x50 [<401305a0>] worker_thread+0x0/0xd0 [<40133ca4>] kthread+0x64/0xa0 [<40133c40>] kthread+0x0/0xa0 [<40104887>] kernel_thread_helper+0x7/0x10 ======================= dd D 40249148 0 18935 16194 f1fc7d88 00000086 f7d41aa0 40249148 00000000 00000000 4237b300 f3c52900 4273fe00 f7d41be8 f7d41aa0 4273fe00 f1fc7de4 42708a64 f1fc7d94 40402eed f1fc7ddc 00000000 401517c5 4040318f 40151780 401342a0 f1fc7ddc f1fc7dd8 Call Trace: [<40249148>] blk_backing_dev_unplug+0x48/0xa0 [<40402eed>] io_schedule+0x1d/0x30 [<401517c5>] sync_page+0x45/0x50 [<4040318f>] __wait_on_bit_lock+0x3f/0x70 [<40151780>] sync_page+0x0/0x50 [<401342a0>] wake_bit_function+0x0/0x60 [<401520ca>] __lock_page+0x9a/0xb0 [<401342a0>] wake_bit_function+0x0/0x60 [<401342a0>] wake_bit_function+0x0/0x60 [<4015279e>] do_generic_mapping_read+0x22e/0x4b0 [<40152da0>] generic_file_aio_read+0x1c0/0x1f0 [<40152a20>] file_read_actor+0x0/0x110 [<40172d6d>] do_sync_read+0xbd/0x110 [<40134250>] autoremove_wake_function+0x0/0x50 [<40116455>] do_page_fault+0x1b5/0x630 [<4012cd5f>] sys_rt_sigaction+0x5f/0xb0 [<40172e83>] vfs_read+0xc3/0x150 [<401731c1>] sys_read+0x41/0x70 [<40103c36>] sysenter_past_esp+0x5f/0x85 [<40400000>] clip_setup+0x20/0x50 ======================= ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives 2007-12-03 16:09 ` Krzysztof Błaszkowski @ 2007-12-03 19:36 ` Darrick J. Wong 2007-12-03 19:43 ` Jeff Garzik 2007-12-03 20:06 ` Krzysztof Błaszkowski 0 siblings, 2 replies; 14+ messages in thread From: Darrick J. Wong @ 2007-12-03 19:36 UTC (permalink / raw) To: Krzysztof B??aszkowski; +Cc: linux-scsi, vst, Alexis Bruemmer On Mon, Dec 03, 2007 at 05:09:54PM +0100, Krzysztof B??aszkowski wrote: > > I noticed also another failure when i removed a drive. The event was not > notified by anything (ie the block device and corresponding sg were > registered) so i run dd on this truly "virtual" drive. > > dd reached D state (as well as scsi_wq) . i think it shouldn't happen no > matter it was AIC failure or LSI expander failure. "It's wireless!" ;) Seriously, though, it's a good idea to tell the kernel that you're about to unplug a disk before actually doing it: echo 1 > /sys/block/sdX/device/delete This way, the kernel can tell the disk to flush its caches long before power actually gets removed. Otherwise, the device removal code can get hung up just like you observed, and whatever's in the write cache may or may not actually get written to the media. --D ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives 2007-12-03 19:36 ` Darrick J. Wong @ 2007-12-03 19:43 ` Jeff Garzik 2007-12-03 21:31 ` Darrick J. Wong 2007-12-03 20:06 ` Krzysztof Błaszkowski 1 sibling, 1 reply; 14+ messages in thread From: Jeff Garzik @ 2007-12-03 19:43 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Krzysztof B??aszkowski, linux-scsi, vst, Alexis Bruemmer Darrick J. Wong wrote: > On Mon, Dec 03, 2007 at 05:09:54PM +0100, Krzysztof B??aszkowski wrote: >> I noticed also another failure when i removed a drive. The event was not >> notified by anything (ie the block device and corresponding sg were >> registered) so i run dd on this truly "virtual" drive. >> >> dd reached D state (as well as scsi_wq) . i think it shouldn't happen no >> matter it was AIC failure or LSI expander failure. > > "It's wireless!" ;) > > Seriously, though, it's a good idea to tell the kernel that you're > about to unplug a disk before actually doing it: > > echo 1 > /sys/block/sdX/device/delete > > This way, the kernel can tell the disk to flush its caches long before > power actually gets removed. Otherwise, the device removal code can > get hung up just like you observed, and whatever's in the write cache > may or may not actually get written to the media. What you say is quite true about write cache -- you can clearly lose some data by hot-unplugging a device. And there's nothing we can do about that. But what do you mean by "device removal code can get hung up"? That sounds like a bug we should fix. Jeff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives 2007-12-03 19:43 ` Jeff Garzik @ 2007-12-03 21:31 ` Darrick J. Wong 0 siblings, 0 replies; 14+ messages in thread From: Darrick J. Wong @ 2007-12-03 21:31 UTC (permalink / raw) To: Jeff Garzik; +Cc: Krzysztof B??aszkowski, linux-scsi, vst, Alexis Bruemmer On Mon, Dec 03, 2007 at 02:43:09PM -0500, Jeff Garzik wrote: > But what do you mean by "device removal code can get hung up"? That sounds > like a bug we should fix. At the moment, libsas' sas_rphy_remove function doesn't distinguish between removing a device before or after the disk has been disconnected. Hence, sd_shutdown tries to tell the disk to flush the write cache, even in the case that the disk is already gone. Maybe the solution is to modify aic94xx to remove the device's DDB registration prior to sending the "device gone" event to libsas so that all subsequent commands bounce with "no such device" instead of going out to lunch. (I'll look into this later, as I myself am going out to lunch right now.) --D ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives 2007-12-03 19:36 ` Darrick J. Wong 2007-12-03 19:43 ` Jeff Garzik @ 2007-12-03 20:06 ` Krzysztof Błaszkowski 2007-12-04 22:35 ` [PATCH] libsas: Don't issue commands to devices that have been hot-removed Darrick J. Wong 1 sibling, 1 reply; 14+ messages in thread From: Krzysztof Błaszkowski @ 2007-12-03 20:06 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-scsi, vst, Alexis Bruemmer Hi Darrick, On Monday 03 December 2007 20:36, Darrick J. Wong wrote: > On Mon, Dec 03, 2007 at 05:09:54PM +0100, Krzysztof B??aszkowski wrote: > > I noticed also another failure when i removed a drive. The event was not > > notified by anything (ie the block device and corresponding sg were > > registered) so i run dd on this truly "virtual" drive. > > > > dd reached D state (as well as scsi_wq) . i think it shouldn't happen no > > matter it was AIC failure or LSI expander failure. > > "It's wireless!" ;) yep :) and energy from positive thinking spins disk's plates ;) > > Seriously, though, it's a good idea to tell the kernel that you're > about to unplug a disk before actually doing it: > > echo 1 > /sys/block/sdX/device/delete > > This way, the kernel can tell the disk to flush its caches long before > power actually gets removed. Otherwise, the device removal code can > get hung up just like you observed, and whatever's in the write cache > may or may not actually get written to the media. > imagine just raining Monday and someone who put hand on the drive thus he had to reboot whole box. Thanks, Krzysztof > --D ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] libsas: Don't issue commands to devices that have been hot-removed. 2007-12-03 20:06 ` Krzysztof Błaszkowski @ 2007-12-04 22:35 ` Darrick J. Wong 2007-12-04 22:48 ` Jeff Garzik 0 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2007-12-04 22:35 UTC (permalink / raw) To: Krzysztof Błaszkowski; +Cc: linux-scsi, vst, Alexis Bruemmer Hrm... does this patch help? You'll get a bunch of ATA/SAS disk errors printed to the screen if you yank the disk, but at least libsas won't get stuck waiting for the cache-flush commands to time out. --- sd will get hung up issuing commands to flush write cache if a SAS device is unplugged without warning. Change libsas to reject commands to domain devices that have already gone away. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> --- drivers/scsi/libsas/sas_ata.c | 4 ++++ drivers/scsi/libsas/sas_expander.c | 3 +++ drivers/scsi/libsas/sas_port.c | 2 ++ drivers/scsi/libsas/sas_scsi_host.c | 7 +++++++ include/scsi/libsas.h | 1 + 5 files changed, 17 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index 0829b55..f5e5213 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -161,6 +161,10 @@ static unsigned int sas_ata_qc_issue(struct ata_queued_cmd *qc) unsigned int num = 0; unsigned int xfer = 0; + /* If the device fell off, no sense in issuing commands */ + if (dev->gone) + return AC_ERR_SYSTEM; + task = sas_alloc_task(GFP_ATOMIC); if (!task) return AC_ERR_SYSTEM; diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index 27674fe..4ba4d2a 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -1680,6 +1680,7 @@ static void sas_unregister_ex_tree(struct domain_device *dev) struct domain_device *child, *n; list_for_each_entry_safe(child, n, &ex->children, siblings) { + child->gone = 1; if (child->dev_type == EDGE_DEV || child->dev_type == FANOUT_DEV) sas_unregister_ex_tree(child); @@ -1699,6 +1700,7 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent, list_for_each_entry_safe(child, n, &ex_dev->children, siblings) { if (SAS_ADDR(child->sas_addr) == SAS_ADDR(phy->attached_sas_addr)) { + child->gone = 1; if (child->dev_type == EDGE_DEV || child->dev_type == FANOUT_DEV) sas_unregister_ex_tree(child); @@ -1707,6 +1709,7 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent, break; } } + parent->gone = 1; sas_disable_routing(parent, phy->attached_sas_addr); memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE); sas_port_delete_phy(phy->port, phy->phy); diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c index b6f0243..2e82097 100644 --- a/drivers/scsi/libsas/sas_port.c +++ b/drivers/scsi/libsas/sas_port.c @@ -144,6 +144,8 @@ void sas_deform_port(struct asd_sas_phy *phy) port->port_dev->pathways--; if (port->num_phys == 1) { + if (port->port_dev) + port->port_dev->gone = 1; sas_unregister_domain_devices(port); sas_port_delete(port->port); port->port = NULL; diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c index c29ba47..61d2679 100644 --- a/drivers/scsi/libsas/sas_scsi_host.c +++ b/drivers/scsi/libsas/sas_scsi_host.c @@ -228,6 +228,13 @@ int sas_queuecommand(struct scsi_cmnd *cmd, goto out; } + /* If the device fell off, no sense in issuing commands */ + if (dev->gone) { + cmd->result = DID_BAD_TARGET << 16; + scsi_done(cmd); + goto out; + } + res = -ENOMEM; task = sas_create_task(cmd, dev, GFP_ATOMIC); if (!task) diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h index 8ad7465..73c5b15 100644 --- a/include/scsi/libsas.h +++ b/include/scsi/libsas.h @@ -207,6 +207,7 @@ struct domain_device { }; void *lldd_dev; + int gone; }; struct sas_discovery_event { ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] libsas: Don't issue commands to devices that have been hot-removed. 2007-12-04 22:35 ` [PATCH] libsas: Don't issue commands to devices that have been hot-removed Darrick J. Wong @ 2007-12-04 22:48 ` Jeff Garzik 2007-12-04 23:17 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Jeff Garzik @ 2007-12-04 22:48 UTC (permalink / raw) To: Darrick J. Wong Cc: Krzysztof Błaszkowski, linux-scsi, vst, Alexis Bruemmer Darrick J. Wong wrote: > Hrm... does this patch help? You'll get a bunch of ATA/SAS disk errors > printed to the screen if you yank the disk, but at least libsas won't > get stuck waiting for the cache-flush commands to time out. > --- > sd will get hung up issuing commands to flush write cache if a SAS device > is unplugged without warning. Change libsas to reject commands to domain > devices that have already gone away. > > Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> > --- > > drivers/scsi/libsas/sas_ata.c | 4 ++++ > drivers/scsi/libsas/sas_expander.c | 3 +++ > drivers/scsi/libsas/sas_port.c | 2 ++ > drivers/scsi/libsas/sas_scsi_host.c | 7 +++++++ > include/scsi/libsas.h | 1 + > 5 files changed, 17 insertions(+), 0 deletions(-) Seems sane... > diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c > index 0829b55..f5e5213 100644 > --- a/drivers/scsi/libsas/sas_ata.c > +++ b/drivers/scsi/libsas/sas_ata.c > @@ -161,6 +161,10 @@ static unsigned int sas_ata_qc_issue(struct ata_queued_cmd *qc) > unsigned int num = 0; > unsigned int xfer = 0; > > + /* If the device fell off, no sense in issuing commands */ > + if (dev->gone) > + return AC_ERR_SYSTEM; > + > task = sas_alloc_task(GFP_ATOMIC); > if (!task) > return AC_ERR_SYSTEM; As an aside, issues like this really really imply a need to move libsas away from the old libata EH stuff (like brking did with ipr, in patches). Jeff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] libsas: Don't issue commands to devices that have been hot-removed. 2007-12-04 22:48 ` Jeff Garzik @ 2007-12-04 23:17 ` Darrick J. Wong 2007-12-04 23:40 ` Jeff Garzik ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Darrick J. Wong @ 2007-12-04 23:17 UTC (permalink / raw) To: Jeff Garzik; +Cc: Krzysztof Błaszkowski, linux-scsi, vst, Alexis Bruemmer On Tue, Dec 04, 2007 at 05:48:33PM -0500, Jeff Garzik wrote: > As an aside, issues like this really really imply a need to move libsas > away from the old libata EH stuff (like brking did with ipr, in patches). Hm... does the new libata EH handle the case of "device was unplugged, don't bother trying to send any more commands"? In general, I agree that sas-ata should adopt the new EH. Unfortunately, I believe the old way of sas-ata configuring ATA ports is somehow not compatible with the new EH stuff and causes a crash during the device probe with my patch to move sas-ata to the new EH. If I apply the patch that migrates sas-ata to use brking's latest ata-sas configuration mechanism (the one that creates real ata_hosts), I see (a) lots and lots of ATA hosts getting created (one per ATA port; possibly undesirable if you've a SAS topology with a lot of SATA disks) and (b) NCQ disks don't seem to work if you unplug the disk and plug it back in (unless NCQ is disabled entirely). Jeff, by any chance have you tried plugging SATA devices into your SAS controllers? James Bottomley wondered if it would be easier to have sas-ata call only into the parts of libata that convert SCSI commands to ATA taskfiles, though I'm unsure how many wormy cans that would open. --D ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] libsas: Don't issue commands to devices that have been hot-removed. 2007-12-04 23:17 ` Darrick J. Wong @ 2007-12-04 23:40 ` Jeff Garzik 2007-12-06 16:55 ` Brian King 2008-02-25 23:39 ` Jeff Garzik 2 siblings, 0 replies; 14+ messages in thread From: Jeff Garzik @ 2007-12-04 23:40 UTC (permalink / raw) To: Darrick J. Wong Cc: Krzysztof Błaszkowski, linux-scsi, vst, Alexis Bruemmer Darrick J. Wong wrote: > On Tue, Dec 04, 2007 at 05:48:33PM -0500, Jeff Garzik wrote: > >> As an aside, issues like this really really imply a need to move libsas >> away from the old libata EH stuff (like brking did with ipr, in patches). > > Hm... does the new libata EH handle the case of "device was > unplugged, don't bother trying to send any more commands"? > > In general, I agree that sas-ata should adopt the new EH. > Unfortunately, I believe the old way of sas-ata configuring ATA ports is > somehow not compatible with the new EH stuff and causes a crash during > the device probe with my patch to move sas-ata to the new EH. If I > apply the patch that migrates sas-ata to use brking's latest ata-sas > configuration mechanism (the one that creates real ata_hosts), I see > (a) lots and lots of ATA hosts getting created (one per ATA port; > possibly undesirable if you've a SAS topology with a lot of SATA disks) > and (b) NCQ disks don't seem to work if you unplug the disk and plug > it back in (unless NCQ is disabled entirely). Jeff, by any chance have > you tried plugging SATA devices into your SAS controllers? aic94xx yes, bcm and mv no. Will take a look though... > James Bottomley wondered if it would be easier to have sas-ata call only > into the parts of libata that convert SCSI commands to ATA taskfiles, > though I'm unsure how many wormy cans that would open. You want more than that. You want to make sure libata is the place for knowledge about weird ATA devices, SATA quirks, ATA device error handling (to be distinguished from ATA /link/ error handling), and other areas. That stuff shouldn't be duplicated, and you /really/ do not want to re-learn all those lessons all over again ;-) Jeff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] libsas: Don't issue commands to devices that have been hot-removed. 2007-12-04 23:17 ` Darrick J. Wong 2007-12-04 23:40 ` Jeff Garzik @ 2007-12-06 16:55 ` Brian King 2008-02-25 23:39 ` Jeff Garzik 2 siblings, 0 replies; 14+ messages in thread From: Brian King @ 2007-12-06 16:55 UTC (permalink / raw) To: Darrick J. Wong Cc: Jeff Garzik, Krzysztof B?aszkowski, linux-scsi, vst, Alexis Bruemmer Darrick J. Wong wrote: > In general, I agree that sas-ata should adopt the new EH. > Unfortunately, I believe the old way of sas-ata configuring ATA ports is > somehow not compatible with the new EH stuff and causes a crash during > the device probe with my patch to move sas-ata to the new EH. If I > apply the patch that migrates sas-ata to use brking's latest ata-sas > configuration mechanism (the one that creates real ata_hosts), I see > (a) lots and lots of ATA hosts getting created (one per ATA port; > possibly undesirable if you've a SAS topology with a lot of SATA disks) The new libata EH ends up spending more time in the error handling thread than the old code did. One of the reasons having multiple ATA/SCSI hosts is a good thing is that is the granularity of error handling, so it prevents stalling all the other devices under that SAS HBA while we are hitting errors on an ATAPI SATA device, for example. Arguably, SATA users of libata already have one SCSI host per ATA port, so my SAS patches really just bring SAS in line with that design... -Brian -- Brian King Linux on Power Virtualization IBM Linux Technology Center ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] libsas: Don't issue commands to devices that have been hot-removed. 2007-12-04 23:17 ` Darrick J. Wong 2007-12-04 23:40 ` Jeff Garzik 2007-12-06 16:55 ` Brian King @ 2008-02-25 23:39 ` Jeff Garzik 2 siblings, 0 replies; 14+ messages in thread From: Jeff Garzik @ 2008-02-25 23:39 UTC (permalink / raw) To: Darrick J. Wong Cc: Jeff Garzik, Krzysztof Błaszkowski, linux-scsi, vst, Alexis Bruemmer (digging through old email) Darrick J. Wong wrote: > On Tue, Dec 04, 2007 at 05:48:33PM -0500, Jeff Garzik wrote: > >> As an aside, issues like this really really imply a need to move libsas >> away from the old libata EH stuff (like brking did with ipr, in patches). > > Hm... does the new libata EH handle the case of "device was > unplugged, don't bother trying to send any more commands"? Yes, most certainly :) We wouldn't have hotplug support without that... > In general, I agree that sas-ata should adopt the new EH. > Unfortunately, I believe the old way of sas-ata configuring ATA ports is > somehow not compatible with the new EH stuff and causes a crash during > the device probe with my patch to move sas-ata to the new EH. If I > apply the patch that migrates sas-ata to use brking's latest ata-sas > configuration mechanism (the one that creates real ata_hosts), I see > (a) lots and lots of ATA hosts getting created (one per ATA port; > possibly undesirable if you've a SAS topology with a lot of SATA disks) > and (b) NCQ disks don't seem to work if you unplug the disk and plug > it back in (unless NCQ is disabled entirely). Jeff, by any chance have > you tried plugging SATA devices into your SAS controllers? Just tested mvsas here... > James Bottomley wondered if it would be easier to have sas-ata call only > into the parts of libata that convert SCSI commands to ATA taskfiles, > though I'm unsure how many wormy cans that would open. Like Brian K noted, libata-EH is heavily involved in "anything not hotpath read/write", including but not limited to: PMP, hotplug, device probing, device revalidation, explicit sequencing of ATA commands during initialization (critical for getting many ATA devices working) You don't want to reinvent or duplicate all those ATA device initialization/revalidation quirks. Jeff ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-02-25 23:39 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-11-30 9:22 aic94xx or libsas crash on X7DB3 supermicro with enclosure and sata drives Krzysztof Błaszkowski 2007-11-30 21:33 ` Darrick J. Wong 2007-12-03 15:11 ` Krzysztof Błaszkowski 2007-12-03 16:09 ` Krzysztof Błaszkowski 2007-12-03 19:36 ` Darrick J. Wong 2007-12-03 19:43 ` Jeff Garzik 2007-12-03 21:31 ` Darrick J. Wong 2007-12-03 20:06 ` Krzysztof Błaszkowski 2007-12-04 22:35 ` [PATCH] libsas: Don't issue commands to devices that have been hot-removed Darrick J. Wong 2007-12-04 22:48 ` Jeff Garzik 2007-12-04 23:17 ` Darrick J. Wong 2007-12-04 23:40 ` Jeff Garzik 2007-12-06 16:55 ` Brian King 2008-02-25 23:39 ` Jeff Garzik
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.