* dramatic I/O slowdown after upgrading 2.6.32->3.0 @ 2012-03-30 16:50 Michael Tokarev 2012-04-02 16:58 ` Jonathan Corbet 2012-04-05 23:29 ` Jan Kara 0 siblings, 2 replies; 14+ messages in thread From: Michael Tokarev @ 2012-03-30 16:50 UTC (permalink / raw) To: Kernel Mailing List Hello. I'm observing a dramatic slowdown of several hosts after upgrading from 2.6.32.y to 3.0.x i686 kernels (in both cases from kernel.org, on both cases the last version is relatively latest). On 2.6.32 everything is fast. On 3.0 the same operations which goes instantly takes ages to complete. For example, out of observed actual differences, munin-graph process on 2.6.32 completes in a few secs writing to a ext4 /var filesystem. On 3.0, the same process takes about a minute and keeps all 5 hard drives (md raid5) 99% busy all this time. apt-get upgrade (from debian/ubuntu) first reads current package status database. This process takes about 3 secs on a freshly booted 2.6.32, and about 40 seconds on a freshly booted 3.0, again, keeping all 5 hdds 99% busy (according to iostat). Only the kernel is different, all the rest is exactly the same. I can reboot into 2.6.32 again after running 3.0, and the system is fast again. The machine is relatively old, it is an IBM xSeries 345 server with some 2.66GHz Xeon (stepping 9) CPU, a Broadcom chipset, an LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI controller and 5x74Gb pSCSI drives. But it is obviously not a reason for it to run _this_ slow... ;) There's another machine here, with an AMD BE-2400 CPU, nVidia MCP55 chipset, AHA-3940U2x pSCSI controller and a set of 74Gb HDDs. It shows similar sympthoms after upgrading from 2.6.32 to 3.0 -- every I/O becomes very slow with all HDDs being busy for long periods. What's the way to debug this issue? Thank you! /mjt ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dramatic I/O slowdown after upgrading 2.6.32->3.0 2012-03-30 16:50 dramatic I/O slowdown after upgrading 2.6.32->3.0 Michael Tokarev @ 2012-04-02 16:58 ` Jonathan Corbet 2012-04-05 23:29 ` Jan Kara 1 sibling, 0 replies; 14+ messages in thread From: Jonathan Corbet @ 2012-04-02 16:58 UTC (permalink / raw) To: Michael Tokarev; +Cc: Kernel Mailing List On Fri, 30 Mar 2012 20:50:54 +0400 Michael Tokarev <mjt@tls.msk.ru> wrote: > I'm observing a dramatic slowdown of several hosts after upgrading > from 2.6.32.y to 3.0.x i686 kernels (in both cases from kernel.org, > on both cases the last version is relatively latest). > > On 2.6.32 everything is fast. On 3.0 the same operations which goes > instantly takes ages to complete. [...] > What's the way to debug this issue? There is a huge gap between those two kernels, so nobody is going to have much luck guessing about what has changed. A good first step might be to do a binary search among the intermediate kernel releases to figure out which one slowed things down for you. jon ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dramatic I/O slowdown after upgrading 2.6.32->3.0 2012-03-30 16:50 dramatic I/O slowdown after upgrading 2.6.32->3.0 Michael Tokarev 2012-04-02 16:58 ` Jonathan Corbet @ 2012-04-05 23:29 ` Jan Kara 2012-04-06 4:45 ` Michael Tokarev 1 sibling, 1 reply; 14+ messages in thread From: Jan Kara @ 2012-04-05 23:29 UTC (permalink / raw) To: Michael Tokarev; +Cc: Kernel Mailing List Hello, On Fri 30-03-12 20:50:54, Michael Tokarev wrote: > I'm observing a dramatic slowdown of several hosts after upgrading > from 2.6.32.y to 3.0.x i686 kernels (in both cases from kernel.org, > on both cases the last version is relatively latest). > > On 2.6.32 everything is fast. On 3.0 the same operations which goes > instantly takes ages to complete. > > For example, out of observed actual differences, munin-graph process > on 2.6.32 completes in a few secs writing to a ext4 /var filesystem. > On 3.0, the same process takes about a minute and keeps all 5 hard > drives (md raid5) 99% busy all this time. > > apt-get upgrade (from debian/ubuntu) first reads current package > status database. This process takes about 3 secs on a freshly > booted 2.6.32, and about 40 seconds on a freshly booted 3.0, > again, keeping all 5 hdds 99% busy (according to iostat). > > Only the kernel is different, all the rest is exactly the same. > I can reboot into 2.6.32 again after running 3.0, and the system > is fast again. > > The machine is relatively old, it is an IBM xSeries 345 server > with some 2.66GHz Xeon (stepping 9) CPU, a Broadcom chipset, an > LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 > SCSI controller and 5x74Gb pSCSI drives. But it is obviously not > a reason for it to run _this_ slow... ;) > > There's another machine here, with an AMD BE-2400 CPU, nVidia MCP55 > chipset, AHA-3940U2x pSCSI controller and a set of 74Gb HDDs. It > shows similar sympthoms after upgrading from 2.6.32 to 3.0 -- every > I/O becomes very slow with all HDDs being busy for long periods. > > What's the way to debug this issue? Identifying a particular kernel where things regresses might help as Jon wrote. Just from top of my head, 3.0 had a bug in device plugging so readahead was broken. I think it was addressed in -stable series so you might want to check out latest 3.0-stable. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dramatic I/O slowdown after upgrading 2.6.32->3.0 2012-04-05 23:29 ` Jan Kara @ 2012-04-06 4:45 ` Michael Tokarev 2012-04-10 2:26 ` Dave Chinner 0 siblings, 1 reply; 14+ messages in thread From: Michael Tokarev @ 2012-04-06 4:45 UTC (permalink / raw) To: Jan Kara; +Cc: Kernel Mailing List On 06.04.2012 03:29, Jan Kara wrote: > On Fri 30-03-12 20:50:54, Michael Tokarev wrote: >> I'm observing a dramatic slowdown of several hosts after upgrading >> from 2.6.32.y to 3.0.x i686 kernels (in both cases from kernel.org, >> on both cases the last version is relatively latest). [] >> What's the way to debug this issue? > Identifying a particular kernel where things regresses might help as Jon > wrote. Just from top of my head, 3.0 had a bug in device plugging so > readahead was broken. I think it was addressed in -stable series so you That's definitely not readahead, it since writes are painfully slow too. I found one more example -- extlinux --once="test kernel" with 3.0 takes about 20 seconds to complete on an idle system. > might want to check out latest 3.0-stable. I did mention this in my initial email (that part quoted above) -- both 2.6.32 and 3.0 are relatively latest from each series, right now it is 3.0.27. Yesterday I tried to do some bisection, but ended up in an unbootable system (it is remote production server), so now I'm waiting for remote hands to repair it (I don't yet know what went wrong, we'll figure it out). I've some time during nights when I can do anything with that machine, but I have to keep it reachable/working on each reboot. Apparently I was wrong saying that there's another machine which suffers from the same issue -- nope, the other machine had an unrelated issue which I fixed. So it turns out that from about 200 different machines, I've just one machine which does not run 3.0 kernel properly. I especially tried 3.0 on a few more - different - machines last weekend, in order to see what other machines has this problem, but found nothing. So I'll try to continue (or actually _start_) the bisection on this very server, the way it will be possible having in mind the difficult conditions. I just thoght I'd ask first, maybe someone knows offhand what may be the problem.. ;) Thank you! /mjt ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dramatic I/O slowdown after upgrading 2.6.32->3.0 2012-04-06 4:45 ` Michael Tokarev @ 2012-04-10 2:26 ` Dave Chinner 2012-04-10 6:00 ` dramatic I/O slowdown after upgrading 2.6.38->3.0+ Michael Tokarev 0 siblings, 1 reply; 14+ messages in thread From: Dave Chinner @ 2012-04-10 2:26 UTC (permalink / raw) To: Michael Tokarev; +Cc: Jan Kara, Kernel Mailing List On Fri, Apr 06, 2012 at 08:45:40AM +0400, Michael Tokarev wrote: > On 06.04.2012 03:29, Jan Kara wrote: > > > On Fri 30-03-12 20:50:54, Michael Tokarev wrote: > >> I'm observing a dramatic slowdown of several hosts after upgrading > >> from 2.6.32.y to 3.0.x i686 kernels (in both cases from kernel.org, > >> on both cases the last version is relatively latest). > [] > >> What's the way to debug this issue? > > Identifying a particular kernel where things regresses might help as Jon > > wrote. Just from top of my head, 3.0 had a bug in device plugging so > > readahead was broken. I think it was addressed in -stable series so you > > That's definitely not readahead, it since writes are painfully slow > too. I found one more example -- extlinux --once="test kernel" with > 3.0 takes about 20 seconds to complete on an idle system. > > > might want to check out latest 3.0-stable. > > I did mention this in my initial email (that part quoted above) -- > both 2.6.32 and 3.0 are relatively latest from each series, > right now it is 3.0.27. > > Yesterday I tried to do some bisection, but ended up in an unbootable > system (it is remote production server), so now I'm waiting for remote > hands to repair it (I don't yet know what went wrong, we'll figure it > out). I've some time during nights when I can do anything with that > machine, but I have to keep it reachable/working on each reboot. > > Apparently I was wrong saying that there's another machine which > suffers from the same issue -- nope, the other machine had an unrelated > issue which I fixed. So it turns out that from about 200 different > machines, I've just one machine which does not run 3.0 kernel properly. > I especially tried 3.0 on a few more - different - machines last > weekend, in order to see what other machines has this problem, but > found nothing. > > So I'll try to continue (or actually _start_) the bisection on this > very server, the way it will be possible having in mind the difficult > conditions. > > I just thoght I'd ask first, maybe someone knows offhand what may be > the problem.. ;) Barriers. Turn them off, and see if that fixes your problem. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dramatic I/O slowdown after upgrading 2.6.38->3.0+ 2012-04-10 2:26 ` Dave Chinner @ 2012-04-10 6:00 ` Michael Tokarev 2012-04-10 15:13 ` Jan Kara 0 siblings, 1 reply; 14+ messages in thread From: Michael Tokarev @ 2012-04-10 6:00 UTC (permalink / raw) To: Dave Chinner; +Cc: Jan Kara, Kernel Mailing List On 10.04.2012 06:26, Dave Chinner wrote: > Barriers. Turn them off, and see if that fixes your problem. Thank you Dave for a hint. And nope, that's not it, not at all... ;) While turning off barriers helps a tiny bit, to gain a few %% from the huge slowdown, it does not cure the issue. Meanwhile, I observed the following: 1) the issue persists on more recent kernels too, I tried 3.3 and it is also as slow as 3.0. 2) at least 2.6.38 kernel works fine, as fast as 2.6.32, I'll try 2.6.39 next. I updated $subject accordingly. 3) the most important thing I think: this is general I/O speed issue. Here's why: 2.6.38: # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 1.73126 s, 60.6 MB/s 3.0: # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 29.4508 s, 3.6 MB/s That's about 20 times difference on direct read from the same - idle - device!! Preparing for another bisect attempt, slowly..... Thank you! /mjt ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dramatic I/O slowdown after upgrading 2.6.38->3.0+ 2012-04-10 6:00 ` dramatic I/O slowdown after upgrading 2.6.38->3.0+ Michael Tokarev @ 2012-04-10 15:13 ` Jan Kara 2012-04-10 19:25 ` Suresh Jayaraman ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Jan Kara @ 2012-04-10 15:13 UTC (permalink / raw) To: Michael Tokarev; +Cc: Dave Chinner, Jan Kara, Kernel Mailing List On Tue 10-04-12 10:00:38, Michael Tokarev wrote: > On 10.04.2012 06:26, Dave Chinner wrote: > > > Barriers. Turn them off, and see if that fixes your problem. > > Thank you Dave for a hint. And nope, that's not it, not at all... ;) > While turning off barriers helps a tiny bit, to gain a few %% from > the huge slowdown, it does not cure the issue. > > Meanwhile, I observed the following: > > 1) the issue persists on more recent kernels too, I tried 3.3 > and it is also as slow as 3.0. > > 2) at least 2.6.38 kernel works fine, as fast as 2.6.32, I'll > try 2.6.39 next. > > I updated $subject accordingly. > > 3) the most important thing I think: this is general I/O speed > issue. Here's why: > > 2.6.38: > # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 > 100+0 records in > 100+0 records out > 104857600 bytes (105 MB) copied, 1.73126 s, 60.6 MB/s > > 3.0: > # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 > 100+0 records in > 100+0 records out > 104857600 bytes (105 MB) copied, 29.4508 s, 3.6 MB/s > > That's about 20 times difference on direct read from the > same - idle - device!! Huh, that's a huge difference for such a trivial load. So we can rule out filesystems, writeback, mm. I also wouldn't think it's IO scheduler but you can always check by comparing dd numbers after echo none >/sys/block/sdb/queue/scheduler Anyway, the most likely cause seems to be some driver issue (which would also explain why you can see it only on one machine). I'd also compare very closely config files of the two kernels if there isn't some unexpected difference... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dramatic I/O slowdown after upgrading 2.6.38->3.0+ 2012-04-10 15:13 ` Jan Kara @ 2012-04-10 19:25 ` Suresh Jayaraman 2012-04-10 19:51 ` Jan Kara 2012-04-11 0:20 ` Henrique de Moraes Holschuh 2012-04-11 9:40 ` Michael Tokarev 2 siblings, 1 reply; 14+ messages in thread From: Suresh Jayaraman @ 2012-04-10 19:25 UTC (permalink / raw) To: Jan Kara; +Cc: Michael Tokarev, Dave Chinner, Kernel Mailing List On 04/10/2012 08:43 PM, Jan Kara wrote: > On Tue 10-04-12 10:00:38, Michael Tokarev wrote: >> On 10.04.2012 06:26, Dave Chinner wrote: >> >>> Barriers. Turn them off, and see if that fixes your problem. >> >> Thank you Dave for a hint. And nope, that's not it, not at all... ;) >> While turning off barriers helps a tiny bit, to gain a few %% from >> the huge slowdown, it does not cure the issue. >> >> Meanwhile, I observed the following: >> >> 1) the issue persists on more recent kernels too, I tried 3.3 >> and it is also as slow as 3.0. >> >> 2) at least 2.6.38 kernel works fine, as fast as 2.6.32, I'll >> try 2.6.39 next. >> >> I updated $subject accordingly. >> >> 3) the most important thing I think: this is general I/O speed >> issue. Here's why: >> >> 2.6.38: >> # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 >> 100+0 records in >> 100+0 records out >> 104857600 bytes (105 MB) copied, 1.73126 s, 60.6 MB/s >> >> 3.0: >> # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 >> 100+0 records in >> 100+0 records out >> 104857600 bytes (105 MB) copied, 29.4508 s, 3.6 MB/s >> >> That's about 20 times difference on direct read from the >> same - idle - device!! > Huh, that's a huge difference for such a trivial load. So we can rule out > filesystems, writeback, mm. I also wouldn't think it's IO scheduler but > you can always check by comparing dd numbers after > echo none >/sys/block/sdb/queue/scheduler s/none/noop you meant noop, of course? Suresh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dramatic I/O slowdown after upgrading 2.6.38->3.0+ 2012-04-10 19:25 ` Suresh Jayaraman @ 2012-04-10 19:51 ` Jan Kara 0 siblings, 0 replies; 14+ messages in thread From: Jan Kara @ 2012-04-10 19:51 UTC (permalink / raw) To: Suresh Jayaraman Cc: Jan Kara, Michael Tokarev, Dave Chinner, Kernel Mailing List On Wed 11-04-12 00:55:44, Suresh Jayaraman wrote: > On 04/10/2012 08:43 PM, Jan Kara wrote: > > On Tue 10-04-12 10:00:38, Michael Tokarev wrote: > >> On 10.04.2012 06:26, Dave Chinner wrote: > >> > >>> Barriers. Turn them off, and see if that fixes your problem. > >> > >> Thank you Dave for a hint. And nope, that's not it, not at all... ;) > >> While turning off barriers helps a tiny bit, to gain a few %% from > >> the huge slowdown, it does not cure the issue. > >> > >> Meanwhile, I observed the following: > >> > >> 1) the issue persists on more recent kernels too, I tried 3.3 > >> and it is also as slow as 3.0. > >> > >> 2) at least 2.6.38 kernel works fine, as fast as 2.6.32, I'll > >> try 2.6.39 next. > >> > >> I updated $subject accordingly. > >> > >> 3) the most important thing I think: this is general I/O speed > >> issue. Here's why: > >> > >> 2.6.38: > >> # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 > >> 100+0 records in > >> 100+0 records out > >> 104857600 bytes (105 MB) copied, 1.73126 s, 60.6 MB/s > >> > >> 3.0: > >> # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 > >> 100+0 records in > >> 100+0 records out > >> 104857600 bytes (105 MB) copied, 29.4508 s, 3.6 MB/s > >> > >> That's about 20 times difference on direct read from the > >> same - idle - device!! > > Huh, that's a huge difference for such a trivial load. So we can rule out > > filesystems, writeback, mm. I also wouldn't think it's IO scheduler but > > you can always check by comparing dd numbers after > > echo none >/sys/block/sdb/queue/scheduler > > s/none/noop > you meant noop, of course? Yeah. Thanks for correction! Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dramatic I/O slowdown after upgrading 2.6.38->3.0+ 2012-04-10 15:13 ` Jan Kara 2012-04-10 19:25 ` Suresh Jayaraman @ 2012-04-11 0:20 ` Henrique de Moraes Holschuh 2012-04-11 9:40 ` Michael Tokarev 2 siblings, 0 replies; 14+ messages in thread From: Henrique de Moraes Holschuh @ 2012-04-11 0:20 UTC (permalink / raw) To: Jan Kara; +Cc: Michael Tokarev, Dave Chinner, Kernel Mailing List On Tue, 10 Apr 2012, Jan Kara wrote: > > 2.6.38: > > # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 > > 100+0 records in > > 100+0 records out > > 104857600 bytes (105 MB) copied, 1.73126 s, 60.6 MB/s > > > > 3.0: > > # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 > > 100+0 records in > > 100+0 records out > > 104857600 bytes (105 MB) copied, 29.4508 s, 3.6 MB/s > > > > That's about 20 times difference on direct read from the > > same - idle - device!! You might want to investigate the cpu-idle stuff (especially intel-idle if it is an Intel box with a recent processor: force the box to use acpi-idle instead) and the cpufreq stuff (try the test with the box with the "performance" governor). > Anyway, the most likely cause seems to be some driver issue (which would > also explain why you can see it only on one machine). I'd also compare very > closely config files of the two kernels if there isn't some unexpected > difference... Indeed. But that's such a massive performance drop, I'd also be comparing the boot log messages of both kernels with diff, and also the lspci -vvv output... just in case :-) -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dramatic I/O slowdown after upgrading 2.6.38->3.0+ 2012-04-10 15:13 ` Jan Kara 2012-04-10 19:25 ` Suresh Jayaraman 2012-04-11 0:20 ` Henrique de Moraes Holschuh @ 2012-04-11 9:40 ` Michael Tokarev 2012-04-11 17:19 ` Mike Christie 2 siblings, 1 reply; 14+ messages in thread From: Michael Tokarev @ 2012-04-11 9:40 UTC (permalink / raw) To: Jan Kara; +Cc: Dave Chinner, Kernel Mailing List, SCSI Mailing List On 10.04.2012 19:13, Jan Kara wrote: > On Tue 10-04-12 10:00:38, Michael Tokarev wrote: [] >> 2.6.38: >> # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 >> 100+0 records in >> 100+0 records out >> 104857600 bytes (105 MB) copied, 1.73126 s, 60.6 MB/s >> >> 3.0: >> # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 >> 100+0 records in >> 100+0 records out >> 104857600 bytes (105 MB) copied, 29.4508 s, 3.6 MB/s >> >> That's about 20 times difference on direct read from the >> same - idle - device!! > Huh, that's a huge difference for such a trivial load. So we can rule out > filesystems, writeback, mm. I also wouldn't think it's IO scheduler but > you can always check by comparing dd numbers after > echo none >/sys/block/sdb/queue/scheduler The scheduler makes very little difference. > Anyway, the most likely cause seems to be some driver issue (which would > also explain why you can see it only on one machine). I'd also compare very Yes, it appears to be mptspi driver (CCing linux-scsi@). Another problem we've hit while trying various kernels/options (and which makes whole experiment very dangerous for us) -- after loading 3.0+ kernel, the machine does not always boot back to older kernel, often freezing while mptspi is initializing or starting doing something, so that only hard reset helps. And since this is a remote production server, the fact that it can freeze any time we do some experiments makes the whole issue quite difficult. It appears that 3.0+ driver does something with the controller which makes at least 2.6.32 kernel/driver misbehave, at least sometimes. > closely config files of the two kernels if there isn't some unexpected > difference... There's one difference between my 2.6.38 and 3.0 configs -- in 3.0+ I enabled CONFIG_SCSI_SCAN_ASYNC. But due to the above I'm not sure I want to experiment right now again, as I need some remote hands to bring the machine back if it'll stuck again. And there's at least two quite significant (I think) differences in dmesg output. 3.0 kernel: [ 2.807983] Fusion MPT base driver 3.04.19 [ 2.808064] Copyright (c) 1999-2008 LSI Corporation [ 2.809826] Fusion MPT SPI Host driver 3.04.19 [ 2.810003] mptspi 0000:08:07.0: PCI INT A -> GSI 27 (level, low) -> IRQ 27 [ 2.810347] mptbase: ioc0: Initiating bringup [ 3.223351] ioc0: LSI53C1030 B2: Capabilities={Initiator} [ 4.113981] scsi4 : ioc0: LSI53C1030 B2, FwRev=01000e00h, Ports=1, MaxQ=222, IRQ=27 [ 4.482468] mptspi 0000:08:07.1: PCI INT B -> GSI 28 (level, low) -> IRQ 28 [ 4.482674] mptbase: ioc1: Initiating bringup the extra warning and 15-sec delay: [ 19.480030] mptbase: ioc0: WARNING - Issuing Reset from mpt_config!!, doorbell=0x24000000 [ 20.120020] mptbase: ioc0: Attempting Retry Config request type 0x4, page 0x1, action 2 [ 20.120173] mptbase: ioc0: Retry completed ret=0x0 timeleft=4500 [ 20.121075] scsi 4:0:0:0: Direct-Access IBM-ESXS DTN073C3UCDY10FN S25J PQ: 0 ANSI: 3 [ 20.121186] scsi target4:0:0: Beginning Domain Validation [ 20.131738] scsi target4:0:0: Ending Domain Validation [ 20.131865] scsi target4:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS HMCS (6.25 ns, offset 127) [ 20.133037] scsi 4:0:1:0: Direct-Access IBM-ESXS DTN073C3UCDY10FN S27M PQ: 0 ANSI: 3 [ 20.133136] scsi target4:0:1: Beginning Domain Validation [ 20.145040] scsi target4:0:1: Ending Domain Validation [ 20.145169] scsi target4:0:1: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS HMCS (6.25 ns, offset 127) [ 20.146368] scsi 4:0:2:0: Direct-Access IBM-ESXS DTN073C3UCDY10FN S27M PQ: 0 ANSI: 3 [ 20.146470] scsi target4:0:2: Beginning Domain Validation [ 20.156885] scsi target4:0:2: Ending Domain Validation [ 20.157013] scsi target4:0:2: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS HMCS (6.25 ns, offset 127) [ 20.158196] scsi 4:0:3:0: Direct-Access IBM-ESXS DTN073C3UCDY10FN S27M PQ: 0 ANSI: 3 [ 20.158297] scsi target4:0:3: Beginning Domain Validation [ 20.168737] scsi target4:0:3: Ending Domain Validation [ 20.168868] scsi target4:0:3: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS HMCS (6.25 ns, offset 127) [ 20.172797] scsi 4:0:4:0: Direct-Access IBM-ESXS MAW3073NC FN C206 PQ: 0 ANSI: 4 [ 20.172898] scsi target4:0:4: Beginning Domain Validation [ 20.192801] scsi target4:0:4: Ending Domain Validation [ 20.192934] scsi target4:0:4: FAST-160 WIDE SCSI 320.0 MB/s DT IU HMCS (6.25 ns, offset 127) [ 20.753704] scsi 4:0:8:0: Processor IBM 32P0032a S320 1 1 PQ: 0 ANSI: 2 [ 20.753810] scsi target4:0:8: Beginning Domain Validation [ 20.754545] scsi target4:0:8: Ending Domain Validation [ 20.754671] scsi target4:0:8: asynchronous [ 23.572044] sd 4:0:0:0: [sdb] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) [ 23.573106] sd 4:0:0:0: [sdb] Write Protect is off [ 23.573204] sd 4:0:0:0: [sdb] Mode Sense: cb 00 00 08 [ 23.574437] sd 4:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 23.577515] sd 4:0:1:0: [sdc] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) [ 23.578569] sd 4:0:1:0: [sdc] Write Protect is off [ 23.578676] sd 4:0:1:0: [sdc] Mode Sense: cb 00 00 08 [ 23.579773] sd 4:0:1:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 23.582656] sd 4:0:2:0: [sdd] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) [ 23.583718] sd 4:0:2:0: [sdd] Write Protect is off [ 23.583813] sd 4:0:2:0: [sdd] Mode Sense: cb 00 00 08 [ 23.585126] sd 4:0:2:0: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 23.589278] sd 4:0:3:0: [sde] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) [ 23.590356] sd 4:0:3:0: [sde] Write Protect is off [ 23.590456] sd 4:0:3:0: [sde] Mode Sense: cb 00 00 08 [ 23.591613] sd 4:0:3:0: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 23.595105] sd 4:0:4:0: [sdf] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) [ 23.597872] sd 4:0:4:0: [sdf] Write Protect is off [ 23.597980] sd 4:0:4:0: [sdf] Mode Sense: cf 00 00 08 [ 23.599393] sd 4:0:4:0: [sdf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 23.605403] sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 > [ 23.608453] sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 sdc6 sdc7 > [ 23.619822] sde: sde1 sde2 sde3 sde4 < sde5 sde6 sde7 > [ 23.620675] sdd: sdd1 sdd2 sdd3 sdd4 < sdd5 sdd6 sdd7 > [ 23.622831] sd 4:0:0:0: [sdb] Attached SCSI disk [ 23.624151] sdf: sdf1 sdf2 sdf3 sdf4 < sdf5 sdf6 sdf7 > [ 23.705272] sd 4:0:1:0: [sdc] Attached SCSI disk [ 23.740272] sd 4:0:2:0: [sdd] Attached SCSI disk [ 23.743111] sd 4:0:4:0: [sdf] Attached SCSI disk [ 23.743997] sd 4:0:3:0: [sde] Attached SCSI disk [ 34.480015] mptbase: ioc1: ERROR - Wait IOC_READY state (0x20000000) timeout(15)! [ 38.870012] ioc1: LSI53C1030 B2: Capabilities={Initiator} [ 39.270011] ioc0: LSI53C1030 B2: Capabilities={Initiator} [ 40.553990] scsi5 : ioc1: LSI53C1030 B2, FwRev=01000e00h, Ports=1, MaxQ=222, IRQ=28 And 2.6.32 (without dmesg timestamps compiled in): Fusion MPT base driver 3.04.12 Copyright (c) 1999-2008 LSI Corporation Fusion MPT SPI Host driver 3.04.12 mptspi 0000:08:07.0: PCI INT A -> GSI 27 (level, low) -> IRQ 27 mptbase: ioc0: Initiating bringup ioc0: LSI53C1030 B2: Capabilities={Initiator} scsi4 : ioc0: LSI53C1030 B2, FwRev=01000e00h, Ports=1, MaxQ=222, IRQ=27 scsi 4:0:0:0: Direct-Access IBM-ESXS DTN073C3UCDY10FN S25J PQ: 0 ANSI: 3 scsi target4:0:0: Beginning Domain Validation scsi target4:0:0: Ending Domain Validation scsi target4:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS HMCS (6.25 ns, offset 127) sd 4:0:0:0: [sda] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) scsi 4:0:1:0: Direct-Access IBM-ESXS DTN073C3UCDY10FN S27M PQ: 0 ANSI: 3 sd 4:0:0:0: [sda] Write Protect is off scsi target4:0:1: Beginning Domain Validation sd 4:0:0:0: [sda] Mode Sense: cb 00 00 08 sd 4:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA sda: scsi target4:0:1: Ending Domain Validation scsi target4:0:1: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS HMCS (6.25 ns, offset 127) sda1 sda2 sda3 sda4 < sd 4:0:1:0: [sdb] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) scsi 4:0:2:0: Direct-Access IBM-ESXS DTN073C3UCDY10FN S27M PQ: 0 ANSI: 3 scsi target4:0:2: Beginning Domain Validation sd 4:0:1:0: [sdb] Write Protect is off sd 4:0:1:0: [sdb] Mode Sense: cb 00 00 08 sd 4:0:1:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA sda5 sdb: sda6 scsi target4:0:2: Ending Domain Validation scsi target4:0:2: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS HMCS (6.25 ns, offset 127) sda7 > sd 4:0:2:0: [sdc] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) scsi 4:0:3:0: Direct-Access IBM-ESXS DTN073C3UCDY10FN S27M PQ: 0 ANSI: 3 scsi target4:0:3: Beginning Domain Validation sd 4:0:2:0: [sdc] Write Protect is off sd 4:0:2:0: [sdc] Mode Sense: cb 00 00 08 sdb1 sdb2 sdb3 sdb4 < sd 4:0:2:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA sdb5 sdc: sdb6 scsi target4:0:3: Ending Domain Validation sd 4:0:0:0: [sda] Attached SCSI disk scsi target4:0:3: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS HMCS (6.25 ns, offset 127) sdb7 > sdc1 sdc2 sdc3 sdc4 < sdc5 ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:1122 __mark_inode_dirty+0xd4/0x130() Hardware name: eserver xSeries 345 -[867052G]- Modules linked in: mptspi(+) mptscsih mptbase scsi_transport_spi usb_storage e1000 ohci_hcd pata_serverworks libata usbhid hid usbcore nls_base sd_mod scsi_mod Pid: 215, comm: blkid Not tainted 2.6.32-i686 #2.6.32.50 Call Trace: [<c103e83e>] ? warn_slowpath_common+0x6e/0xb0 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c103e893>] ? warn_slowpath_null+0x13/0x20 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c10e5b0a>] ? touch_atime+0xea/0x130 [<c10a23ee>] ? generic_file_aio_read+0x41e/0x6f0 [<c10d3370>] ? do_sync_read+0x0/0x110 [<c10d3446>] ? do_sync_read+0xd6/0x110 [<c1057a30>] ? autoremove_wake_function+0x0/0x40 [<c1024ca8>] ? kunmap_atomic+0x58/0x70 [<c10ba9b5>] ? handle_mm_fault+0x2e5/0x9c0 [<c10f9c94>] ? block_llseek+0xb4/0xe0 [<c10d3c5f>] ? vfs_read+0x8f/0x190 [<c10d31b8>] ? vfs_llseek+0x38/0x50 [<c10d3d9c>] ? sys_read+0x3c/0x70 [<c1002d38>] ? sysenter_do_call+0x12/0x2c ---[ end trace 440816db81b818c5 ]--- bdi-block not registered sdc6 sd 4:0:3:0: [sdd] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) sd 4:0:3:0: [sdd] Write Protect is off sd 4:0:3:0: [sdd] Mode Sense: cb 00 00 08 sdc7 > sd 4:0:3:0: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:1122 __mark_inode_dirty+0xd4/0x130() Hardware name: eserver xSeries 345 -[867052G]- Modules linked in: mptspi(+) mptscsih mptbase scsi_transport_spi usb_storage e1000 ohci_hcd pata_serverworks libata usbhid hid usbcore nls_base sd_mod scsi_mod Pid: 215, comm: blkid Tainted: G W 2.6.32-i686 #2.6.32.50 Call Trace: [<c103e83e>] ? warn_slowpath_common+0x6e/0xb0 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c103e893>] ? warn_slowpath_null+0x13/0x20 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c10e5b0a>] ? touch_atime+0xea/0x130 [<c10a23ee>] ? generic_file_aio_read+0x41e/0x6f0 [<c10d3370>] ? do_sync_read+0x0/0x110 [<c10d3446>] ? do_sync_read+0xd6/0x110 [<c1057a30>] ? autoremove_wake_function+0x0/0x40 [<c10e0ffc>] ? do_vfs_ioctl+0x6c/0x550 [<c10f9c94>] ? block_llseek+0xb4/0xe0 [<c10d3c5f>] ? vfs_read+0x8f/0x190 [<c10d31b8>] ? vfs_llseek+0x38/0x50 [<c10d3d9c>] ? sys_read+0x3c/0x70 [<c1002d38>] ? sysenter_do_call+0x12/0x2c ---[ end trace 440816db81b818c6 ]--- bdi-block not registered sdd: sdd1 ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:1122 __mark_inode_dirty+0xd4/0x130() Hardware name: eserver xSeries 345 -[867052G]- Modules linked in: mptspi(+) mptscsih mptbase scsi_transport_spi usb_storage e1000 ohci_hcd pata_serverworks libata usbhid hid usbcore nls_base sd_mod scsi_mod Pid: 261, comm: mdev Tainted: G W 2.6.32-i686 #2.6.32.50 Call Trace: [<c103e83e>] ? warn_slowpath_common+0x6e/0xb0 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c103e893>] ? warn_slowpath_null+0x13/0x20 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c10e753b>] ? inode_setattr+0xab/0x170 [<c10e73a8>] ? inode_change_ok+0xa8/0x190 [<c10e77c0>] ? notify_change+0x1c0/0x330 [<c10d2c09>] ? sys_fchmodat+0xb9/0xe0 [<c10d2c50>] ? sys_chmod+0x20/0x30 [<c1002d38>] ? sysenter_do_call+0x12/0x2c ---[ end trace 440816db81b818c7 ]--- bdi-block not registered sdd2 sdd3 sdd4 < scsi 4:0:4:0: Direct-Access IBM-ESXS MAW3073NC FN C206 PQ: 0 ANSI: 4 scsi target4:0:4: Beginning Domain Validation sdd5 sdd6 sd 4:0:1:0: [sdb] Attached SCSI disk sdd7 > ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:1122 __mark_inode_dirty+0xd4/0x130() Hardware name: eserver xSeries 345 -[867052G]- Modules linked in: scsi target4:0:4: Ending Domain Validation scsi target4:0:4: FAST-160 WIDE SCSI 320.0 MB/s DT IU HMCS (6.25 ns, offset 127) mptspi(+) mptscsih mptbase scsi_transport_spi usb_storage e1000 ohci_hcd pata_serverworks libata usbhid hid usbcore nls_base sd_mod scsi_mod Pid: 283, comm: blkid Tainted: G W 2.6.32-i686 #2.6.32.50 Call Trace: [<c103e83e>] ? warn_slowpath_common+0x6e/0xb0 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c103e893>] ? warn_slowpath_null+0x13/0x20 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c10e5b0a>] ? touch_atime+0xea/0x130 [<c10a23ee>] ? generic_file_aio_read+0x41e/0x6f0 [<c10d3370>] ? do_sync_read+0x0/0x110 [<c10d3446>] ? do_sync_read+0xd6/0x110 [<c1057a30>] ? autoremove_wake_function+0x0/0x40 [<c1024ca8>] ? kunmap_atomic+0x58/0x70 [<c10ba9b5>] ? handle_mm_fault+0x2e5/0x9c0 [<c10f9c94>] ? block_llseek+0xb4/0xe0 [<c10d3c5f>] ? vfs_read+0x8f/0x190 [<c10d31b8>] ? vfs_llseek+0x38/0x50 [<c10d3d9c>] ? sys_read+0x3c/0x70 [<c1002d38>] ? sysenter_do_call+0x12/0x2c ---[ end trace 440816db81b818c8 ]--- bdi-block not registered sd 4:0:2:0: [sdc] Attached SCSI disk ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:1122 __mark_inode_dirty+0xd4/0x130() Hardware name: eserver xSeries 345 -[867052G]- Modules linked in: mptspi(+) mptscsih mptbase scsi_transport_spi usb_storage e1000 ohci_hcd pata_serverworks libata usbhid hid usbcore nls_base sd_mod scsi_mod Pid: 283, comm: blkid Tainted: G W 2.6.32-i686 #2.6.32.50 Call Trace: [<c103e83e>] ? warn_slowpath_common+0x6e/0xb0 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c103e893>] ? warn_slowpath_null+0x13/0x20 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c10e5b0a>] ? touch_atime+0xea/0x130 [<c10a23ee>] ? generic_file_aio_read+0x41e/0x6f0 [<c10d3370>] ? do_sync_read+0x0/0x110 [<c10d3446>] ? do_sync_read+0xd6/0x110 [<c1057a30>] ? autoremove_wake_function+0x0/0x40 [<c10e0ffc>] ? do_vfs_ioctl+0x6c/0x550 [<c10f9c94>] ? block_llseek+0xb4/0xe0 [<c10d3c5f>] ? vfs_read+0x8f/0x190 [<c10d31b8>] ? vfs_llseek+0x38/0x50 [<c10d3d9c>] ? sys_read+0x3c/0x70 [<c1002d38>] ? sysenter_do_call+0x12/0x2c ---[ end trace 440816db81b818c9 ]--- bdi-block not registered ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:1122 __mark_inode_dirty+0xd4/0x130() Hardware name: eserver xSeries 345 -[867052G]- Modules linked in: mptspi(+) mptscsih mptbase scsi_transport_spi usb_storage e1000 ohci_hcd pata_serverworks libata usbhid hid usbcore nls_base sd_mod scsi_mod Pid: 283, comm: blkid Tainted: G W 2.6.32-i686 #2.6.32.50 Call Trace: [<c103e83e>] ? warn_slowpath_common+0x6e/0xb0 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c103e893>] ? warn_slowpath_null+0x13/0x20 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c10e5b0a>] ? touch_atime+0xea/0x130 [<c10a23ee>] ? generic_file_aio_read+0x41e/0x6f0 [<c10d3370>] ? do_sync_read+0x0/0x110 [<c10d3446>] ? do_sync_read+0xd6/0x110 [<c1057a30>] ? autoremove_wake_function+0x0/0x40 [<c10e0ffc>] ? do_vfs_ioctl+0x6c/0x550 [<c10f9c94>] ? block_llseek+0xb4/0xe0 [<c10d3c5f>] ? vfs_read+0x8f/0x190 [<c10d31b8>] ? vfs_llseek+0x38/0x50 [<c10d3d9c>] ? sys_read+0x3c/0x70 [<c1002d38>] ? sysenter_do_call+0x12/0x2c ---[ end trace 440816db81b818ca ]--- bdi-block not registered sd 4:0:4:0: [sde] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) sd 4:0:4:0: [sde] Write Protect is off sd 4:0:4:0: [sde] Mode Sense: cf 00 00 08 sd 4:0:4:0: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:1122 __mark_inode_dirty+0xd4/0x130() Hardware name: eserver xSeries 345 -[867052G]- Modules linked in: mptspi(+) mptscsih mptbase scsi_transport_spi usb_storage e1000 ohci_hcd pata_serverworks libata usbhid hid usbcore nls_base sd_mod scsi_mod Pid: 329, comm: mdev Tainted: G W 2.6.32-i686 #2.6.32.50 Call Trace: [<c103e83e>] ? warn_slowpath_common+0x6e/0xb0 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c103e893>] ? warn_slowpath_null+0x13/0x20 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c10e753b>] ? inode_setattr+0xab/0x170 [<c10e73a8>] ? inode_change_ok+0xa8/0x190 [<c10e77c0>] ? notify_change+0x1c0/0x330 [<c10d2c09>] ? sys_fchmodat+0xb9/0xe0 [<c10d2c50>] ? sys_chmod+0x20/0x30 [<c1002d38>] ? sysenter_do_call+0x12/0x2c ---[ end trace 440816db81b818cb ]--- bdi-block not registered sde: sde1 sde2 sde3 sde4 < sde5 sde6 scsi 4:0:8:0: Processor IBM 32P0032a S320 1 1 PQ: 0 ANSI: 2 scsi target4:0:8: Beginning Domain Validation scsi target4:0:8: Ending Domain Validation scsi target4:0:8: asynchronous sde7 > ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:1122 __mark_inode_dirty+0xd4/0x130() Hardware name: eserver xSeries 345 -[867052G]- Modules linked in: mptspi(+) mptscsih mptbase scsi_transport_spi usb_storage e1000 ohci_hcd pata_serverworks libata usbhid hid usbcore nls_base sd_mod scsi_mod Pid: 369, comm: mdev Tainted: G W 2.6.32-i686 #2.6.32.50 Call Trace: [<c103e83e>] ? warn_slowpath_common+0x6e/0xb0 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c103e893>] ? warn_slowpath_null+0x13/0x20 [<c10ef314>] ? __mark_inode_dirty+0xd4/0x130 [<c10e753b>] ? inode_setattr+0xab/0x170 [<c10e73a8>] ? inode_change_ok+0xa8/0x190 [<c10e77c0>] ? notify_change+0x1c0/0x330 [<c10d2c09>] ? sys_fchmodat+0xb9/0xe0 [<c10d2c50>] ? sys_chmod+0x20/0x30 [<c1002d38>] ? sysenter_do_call+0x12/0x2c ---[ end trace 440816db81b818cc ]--- bdi-block not registered sd 4:0:3:0: [sdd] Attached SCSI disk mptspi 0000:08:07.1: PCI INT B -> GSI 28 (level, low) -> IRQ 28 mptbase: ioc1: Initiating bringup ioc1: LSI53C1030 B2: Capabilities={Initiator} sd 4:0:4:0: [sde] Attached SCSI disk These warnings is what prompted me to try a more recent kernel actually, plus the fact that 2.6.32 is reaching its end of line (so to say). These warnings aren't always shown, sometimes it boots fine. Here's a dmesg from 2.6.38 kernel, which shows no issues whatsoever: [ 2.910215] Fusion MPT base driver 3.04.18 [ 2.910299] Copyright (c) 1999-2008 LSI Corporation [ 2.922747] Fusion MPT SPI Host driver 3.04.18 [ 2.922912] mptspi 0000:08:07.0: PCI INT A -> GSI 27 (level, low) -> IRQ 27 [ 2.923145] mptbase: ioc0: Initiating bringup [ 3.340016] ioc0: LSI53C1030 B2: Capabilities={Initiator} [ 4.230644] scsi4 : ioc0: LSI53C1030 B2, FwRev=01000e00h, Ports=1, MaxQ=222, IRQ=27 [ 4.601148] scsi 4:0:0:0: Direct-Access IBM-ESXS DTN073C3UCDY10FN S25J PQ: 0 ANSI: 3 [ 4.601257] scsi target4:0:0: Beginning Domain Validation [ 4.612026] scsi target4:0:0: Ending Domain Validation [ 4.612155] scsi target4:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS HMCS (6.25 ns, offset 127) [ 4.614638] sd 4:0:0:0: [sdb] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) [ 4.616258] sd 4:0:0:0: [sdb] Write Protect is off [ 4.616341] sd 4:0:0:0: [sdb] Mode Sense: cb 00 00 08 [ 4.616524] scsi 4:0:1:0: Direct-Access IBM-ESXS DTN073C3UCDY10FN S27M PQ: 0 ANSI: 3 [ 4.616653] scsi target4:0:1: Beginning Domain Validation [ 4.618073] sd 4:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 4.627985] scsi target4:0:1: Ending Domain Validation [ 4.628122] scsi target4:0:1: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS HMCS (6.25 ns, offset 127) [ 4.630165] sd 4:0:1:0: [sdc] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) [ 4.631535] sd 4:0:1:0: [sdc] Write Protect is off [ 4.631626] sd 4:0:1:0: [sdc] Mode Sense: cb 00 00 08 [ 4.632327] scsi 4:0:2:0: Direct-Access IBM-ESXS DTN073C3UCDY10FN S27M PQ: 0 ANSI: 3 [ 4.632524] scsi target4:0:2: Beginning Domain Validation [ 4.633594] sd 4:0:1:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 4.644183] scsi target4:0:2: Ending Domain Validation [ 4.645739] scsi target4:0:2: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS HMCS (6.25 ns, offset 127) [ 4.648020] sd 4:0:2:0: [sdd] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) [ 4.649041] sd 4:0:2:0: [sdd] Write Protect is off [ 4.649133] sd 4:0:2:0: [sdd] Mode Sense: cb 00 00 08 [ 4.650478] scsi 4:0:3:0: Direct-Access IBM-ESXS DTN073C3UCDY10FN S27M PQ: 0 ANSI: 3 [ 4.650618] sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 > [ 4.650650] scsi target4:0:3: Beginning Domain Validation [ 4.651796] sd 4:0:2:0: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 4.661785] sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 sdc6 sdc7 > [ 4.663819] scsi target4:0:3: Ending Domain Validation [ 4.664108] scsi target4:0:3: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS HMCS (6.25 ns, offset 127) [ 4.665724] sd 4:0:0:0: [sdb] Attached SCSI disk [ 4.684535] sd 4:0:3:0: [sde] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) [ 4.685537] sd 4:0:3:0: [sde] Write Protect is off [ 4.685649] sd 4:0:3:0: [sde] Mode Sense: cb 00 00 08 [ 4.686944] sd 4:0:3:0: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 4.687854] sdd: sdd1 sdd2 sdd3 sdd4 < sdd5 sdd6 sdd7 > [ 4.702237] scsi 4:0:4:0: Direct-Access IBM-ESXS MAW3073NC FN C206 PQ: 0 ANSI: 4 [ 4.702392] scsi target4:0:4: Beginning Domain Validation [ 4.716211] sd 4:0:1:0: [sdc] Attached SCSI disk [ 4.724526] sde: sde1 sde2 sde3 sde4 < sde5 sde6 sde7 > [ 4.727350] scsi target4:0:4: Ending Domain Validation [ 4.727559] scsi target4:0:4: FAST-160 WIDE SCSI 320.0 MB/s DT IU HMCS (6.25 ns, offset 127) [ 4.737932] sd 4:0:2:0: [sdd] Attached SCSI disk [ 4.746930] sd 4:0:4:0: [sdf] 143374000 512-byte logical blocks: (73.4 GB/68.3 GiB) [ 4.749646] sd 4:0:4:0: [sdf] Write Protect is off [ 4.749743] sd 4:0:4:0: [sdf] Mode Sense: cf 00 00 08 [ 4.751182] sd 4:0:4:0: [sdf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 5.255072] scsi 4:0:8:0: Processor IBM 32P0032a S320 1 1 PQ: 0 ANSI: 2 [ 5.255205] scsi target4:0:8: Beginning Domain Validation [ 5.255937] scsi target4:0:8: Ending Domain Validation [ 5.256065] scsi target4:0:8: asynchronous [ 6.261554] sdf: sdf1 sdf2 sdf3 sdf4 < sdf5 sdf6 sdf7 > [ 7.015879] mptspi 0000:08:07.1: PCI INT B -> GSI 28 (level, low) -> IRQ 28 [ 7.016573] mptbase: ioc1: Initiating bringup [ 7.423346] ioc1: LSI53C1030 B2: Capabilities={Initiator} [ 8.340889] sd 4:0:3:0: [sde] Attached SCSI disk [ 8.344413] scsi5 : ioc1: LSI53C1030 B2, FwRev=01000e00h, Ports=1, MaxQ=222, IRQ=28 [ 8.351396] sd 4:0:4:0: [sdf] Attached SCSI disk The warnings shown in 2.6.32 dmesg above aren't always shown, sometimes (more often than not) it boots without any warnings, like this 2.6.38 dmesg. Here's the controller in question, from lspci: 08:07.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) Subsystem: IBM Device 026c Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 72 (4250ns min, 4500ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 27 Region 0: I/O ports at 2600 [size=256] Region 1: Memory at f9ff0000 (64-bit, non-prefetchable) [size=64K] Region 3: Memory at f9fe0000 (64-bit, non-prefetchable) [size=64K] [virtual] Expansion ROM at a0100000 [disabled] [size=1M] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [68] PCI-X non-bridge device Command: DPERE- ERO- RBC=512 OST=1 Status: Dev=08:07.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=8 DMCRS=16 RSCEM- 266MHz- 533MHz- Kernel driver in use: mptspi 08:07.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) Subsystem: IBM Device 026c Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 72 (4250ns min, 4500ns max), Cache Line Size: 32 bytes Interrupt: pin B routed to IRQ 28 Region 0: I/O ports at 2700 [size=256] Region 1: Memory at f9fd0000 (64-bit, non-prefetchable) [size=64K] Region 3: Memory at f9fc0000 (64-bit, non-prefetchable) [size=64K] [virtual] Expansion ROM at a0200000 [disabled] [size=1M] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [68] PCI-X non-bridge device Command: DPERE- ERO- RBC=512 OST=1 Status: Dev=08:07.1 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=8 DMCRS=16 RSCEM- 266MHz- 533MHz- Kernel driver in use: mptspi (Only one port from the two is in use). Are there any other guesses about all this? Thank you! /mjt ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dramatic I/O slowdown after upgrading 2.6.38->3.0+ 2012-04-11 9:40 ` Michael Tokarev @ 2012-04-11 17:19 ` Mike Christie 2012-04-11 17:55 ` Michael Tokarev 2012-04-11 18:28 ` Jan Kara 0 siblings, 2 replies; 14+ messages in thread From: Mike Christie @ 2012-04-11 17:19 UTC (permalink / raw) To: Michael Tokarev Cc: Jan Kara, Dave Chinner, Kernel Mailing List, SCSI Mailing List On 04/11/2012 04:40 AM, Michael Tokarev wrote: > On 10.04.2012 19:13, Jan Kara wrote: >> > On Tue 10-04-12 10:00:38, Michael Tokarev wrote: > [] >>> >> 2.6.38: >>> >> # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 >>> >> 100+0 records in >>> >> 100+0 records out >>> >> 104857600 bytes (105 MB) copied, 1.73126 s, 60.6 MB/s >>> >> >>> >> 3.0: >>> >> # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 >>> >> 100+0 records in >>> >> 100+0 records out >>> >> 104857600 bytes (105 MB) copied, 29.4508 s, 3.6 MB/s >>> >> >>> >> That's about 20 times difference on direct read from the >>> >> same - idle - device!! >> > Huh, that's a huge difference for such a trivial load. So we can rule out >> > filesystems, writeback, mm. I also wouldn't think it's IO scheduler but >> > you can always check by comparing dd numbers after >> > echo none >/sys/block/sdb/queue/scheduler Did you try newer 3.X kernels or just 3.0? We were hitting a similar problem with iscsi. Same workload and it started with 2.6.38. I think it turned out to be this issue: // thread with issue like what we hit: http://thread.gmane.org/gmane.linux.kernel/1244680 // Patch that I think fixed issue: commit 3deaa7190a8da38453c4fabd9dec7f66d17fff67 Author: Shaohua Li <shaohua.li@intel.com> Date: Fri Feb 3 15:37:17 2012 -0800 readahead: fix pipeline break caused by block plug ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dramatic I/O slowdown after upgrading 2.6.38->3.0+ 2012-04-11 17:19 ` Mike Christie @ 2012-04-11 17:55 ` Michael Tokarev 2012-04-11 18:28 ` Jan Kara 1 sibling, 0 replies; 14+ messages in thread From: Michael Tokarev @ 2012-04-11 17:55 UTC (permalink / raw) To: Mike Christie Cc: Jan Kara, Dave Chinner, Kernel Mailing List, SCSI Mailing List On 11.04.2012 21:19, Mike Christie wrote: > On 04/11/2012 04:40 AM, Michael Tokarev wrote: >> On 10.04.2012 19:13, Jan Kara wrote: >>>> On Tue 10-04-12 10:00:38, Michael Tokarev wrote: >> [] >>>>>> 2.6.38: >>>>>> # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 >>>>>> 100+0 records in >>>>>> 100+0 records out >>>>>> 104857600 bytes (105 MB) copied, 1.73126 s, 60.6 MB/s >>>>>> >>>>>> 3.0: >>>>>> # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 >>>>>> 100+0 records in >>>>>> 100+0 records out >>>>>> 104857600 bytes (105 MB) copied, 29.4508 s, 3.6 MB/s >>>>>> >>>>>> That's about 20 times difference on direct read from the >>>>>> same - idle - device!! >>>> Huh, that's a huge difference for such a trivial load. So we can rule out >>>> filesystems, writeback, mm. I also wouldn't think it's IO scheduler but >>>> you can always check by comparing dd numbers after >>>> echo none >/sys/block/sdb/queue/scheduler > > Did you try newer 3.X kernels or just 3.0? I tried 3.3.1, it shows exactly the same very slow speed (about 3 MB/sec vs 60 MB/sec). > We were hitting a similar problem with iscsi. Same workload and it > started with 2.6.38. I think it turned out to be this issue: > > // thread with issue like what we hit: > http://thread.gmane.org/gmane.linux.kernel/1244680 This thread refers to buffered I/O as far as I can see. Note I especially used iflag=direct of dd to rule out all buffer operations. The I/O really is very very slow, the disk is 100% busy all this time (which is also not the situation described in the thread you referenced above - there, disk (SSD) does not have enough work to do). > // Patch that I think fixed issue: > commit 3deaa7190a8da38453c4fabd9dec7f66d17fff67 > Author: Shaohua Li <shaohua.li@intel.com> > Date: Fri Feb 3 15:37:17 2012 -0800 > > readahead: fix pipeline break caused by block plug I think this patch is included into 3.3 kernel, it was in 3.3-rc2 if my git-fu is right. If it is, I tried it (as 3.3.1) and it didn't help at all. Thank you! /mjt ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dramatic I/O slowdown after upgrading 2.6.38->3.0+ 2012-04-11 17:19 ` Mike Christie 2012-04-11 17:55 ` Michael Tokarev @ 2012-04-11 18:28 ` Jan Kara 1 sibling, 0 replies; 14+ messages in thread From: Jan Kara @ 2012-04-11 18:28 UTC (permalink / raw) To: Mike Christie Cc: Michael Tokarev, Jan Kara, Dave Chinner, Kernel Mailing List, SCSI Mailing List On Wed 11-04-12 12:19:43, Mike Christie wrote: > On 04/11/2012 04:40 AM, Michael Tokarev wrote: > > On 10.04.2012 19:13, Jan Kara wrote: > >> > On Tue 10-04-12 10:00:38, Michael Tokarev wrote: > > [] > >>> >> 2.6.38: > >>> >> # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 > >>> >> 100+0 records in > >>> >> 100+0 records out > >>> >> 104857600 bytes (105 MB) copied, 1.73126 s, 60.6 MB/s > >>> >> > >>> >> 3.0: > >>> >> # dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=100 > >>> >> 100+0 records in > >>> >> 100+0 records out > >>> >> 104857600 bytes (105 MB) copied, 29.4508 s, 3.6 MB/s > >>> >> > >>> >> That's about 20 times difference on direct read from the > >>> >> same - idle - device!! > >> > Huh, that's a huge difference for such a trivial load. So we can rule out > >> > filesystems, writeback, mm. I also wouldn't think it's IO scheduler but > >> > you can always check by comparing dd numbers after > >> > echo none >/sys/block/sdb/queue/scheduler > > Did you try newer 3.X kernels or just 3.0? > > We were hitting a similar problem with iscsi. Same workload and it > started with 2.6.38. I think it turned out to be this issue: > > // thread with issue like what we hit: > http://thread.gmane.org/gmane.linux.kernel/1244680 > > // Patch that I think fixed issue: > commit 3deaa7190a8da38453c4fabd9dec7f66d17fff67 > Author: Shaohua Li <shaohua.li@intel.com> > Date: Fri Feb 3 15:37:17 2012 -0800 > > readahead: fix pipeline break caused by block plug I already asked about this but that doesn't seem to be the cause. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2012-04-11 18:28 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-03-30 16:50 dramatic I/O slowdown after upgrading 2.6.32->3.0 Michael Tokarev 2012-04-02 16:58 ` Jonathan Corbet 2012-04-05 23:29 ` Jan Kara 2012-04-06 4:45 ` Michael Tokarev 2012-04-10 2:26 ` Dave Chinner 2012-04-10 6:00 ` dramatic I/O slowdown after upgrading 2.6.38->3.0+ Michael Tokarev 2012-04-10 15:13 ` Jan Kara 2012-04-10 19:25 ` Suresh Jayaraman 2012-04-10 19:51 ` Jan Kara 2012-04-11 0:20 ` Henrique de Moraes Holschuh 2012-04-11 9:40 ` Michael Tokarev 2012-04-11 17:19 ` Mike Christie 2012-04-11 17:55 ` Michael Tokarev 2012-04-11 18:28 ` Jan Kara
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.