All of lore.kernel.org
 help / color / mirror / Atom feed
* All devices on host blocked
@ 2014-06-12 17:36 Michael Robbert
  0 siblings, 0 replies; only message in thread
From: Michael Robbert @ 2014-06-12 17:36 UTC (permalink / raw)
  To: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 2580 bytes --]

I have a large JBOD attached to my server via an LSI SAS2308 PCI card(mpt2sas driver). I've got about 40 drives right now assembled into 4 Linux software RAID sets and I am using those RAID volumes as back end devices for GPFS. 
Everything was working fine about a week ago when I had 20 drives and 2 RAID volumes then I added 20 new disks, all the same model, and now I am frequently seeing all the devices behind the SAS card reporting device_blocked immediately followed by device_unblocked. These events are correlated with a period of many seconds of no data throughput. This is happening often enough to cause major throughput problems. I have seen similar problem in the past, but they were accompanied by some kind of disk specific error and I could fix the situation by removing the disk. In this case there are no other errors in any log besides the device_blocked and device_unblocked on every single device.
This system is not in production yet so I can blow it all away if I need to, but I really want to understand what is causing this so that if it does come back once we go into production I'll be able to fix it without major disruptions. I suspect there is a misbehaving drive, but there is nothing pointing to a single drive and I could be completely wrong about that. Does anybody have any clue where to look?

Here is what the error logs look like:

Jun 11 19:29:17 storage003 kernel: sd 6:0:0:0: device_blocked, handle(0x0016)
Jun 11 19:29:17 storage003 kernel: sd 6:0:1:0: device_blocked, handle(0x000b)
Jun 11 19:29:17 storage003 kernel: sd 6:0:2:0: device_blocked, handle(0x000c)
Jun 11 19:29:17 storage003 kernel: ses 6:0:3:0: device_blocked, handle(0x000e)
Jun 11 19:29:17 storage003 kernel: sd 6:0:4:0: device_blocked, handle(0x000f)
Jun 11 19:29:17 storage003 kernel: sd 6:0:5:0: device_blocked, handle(0x0010)
... Same thing for the rest of the devices on host6
Jun 11 19:29:18 storage003 kernel: sd 6:0:0:0: device_unblocked and set to running, handle(0x0016)
Jun 11 19:29:18 storage003 kernel: sd 6:0:1:0: device_unblocked and set to running, handle(0x000b)
Jun 11 19:29:18 storage003 kernel: sd 6:0:2:0: device_unblocked and set to running, handle(0x000c)
Jun 11 19:29:18 storage003 kernel: ses 6:0:3:0: device_unblocked and set to running, handle(0x000e)
Jun 11 19:29:18 storage003 kernel: sd 6:0:4:0: device_unblocked and set to running, handle(0x000f)
Jun 11 19:29:18 storage003 kernel: sd 6:0:5:0: device_unblocked and set to running, handle(0x0010)
... Same thing for the rest of the devices again.

Thanks,
Mike Robbert

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5142 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2014-06-12 17:41 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-12 17:36 All devices on host blocked Michael Robbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.