All of lore.kernel.org
 help / color / mirror / Atom feed
* TCQ problems in 2.6.0-test1: the summary
@ 2003-07-19 22:37 Ivan Gyurdiev
  2003-07-21 12:33 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Ivan Gyurdiev @ 2003-07-19 22:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Bartlomiej Zolnierkiewicz, Jens Axboe

2.6.0-test1-current.
The TCQ bugs/problems that I have found in the kernel have not been
addressed yet. Some of the things posted below are new, but most have
been posted before, and there have been no replies. If those bug reports
are invalid, please say so, and I will stop sending them.

================================================================================

I own an IC35L080AVVA07-0 80 GB drive
(IBM Desktar 120 GXP, which is supposed to support TCQ).
TCQ will not be activated on boot unless TCQ is enabled by default.

The problems:

======================================================================================
1) This patch by Jens Axboe makes my machine bootable with tcq enabled.
It hasn't been included in the kernel yet.

http://www.ussg.iu.edu/hypermail/linux/kernel/0307.1/1006.html

2) The default for queue depth is commented as 32, but is in fact 8.

3) This is described as a way to set tcq depth in the docs:
  echo "using_tcq:32" > /proc/ide/hdX/settings

but it results in:  proc_ide_write_settings(): parse error
(hdparm -Q works instead)

4) Using a tcq-enabled kernel with queue depth of 8 results in
massive filesystem corruption for me, verified under reiserfs, and xfs.
Elevator choice does not appear to matter, while queue depth is
important - I do not appear to get filesystem corruption with queue
depth of 32. Reiser refuses to mount with such a kernel, and runs
--fix-fixable at boot time. This is reproducible every time.

5) Using a tcq-enabled kernel causes i/o lockups (disk read/write
freezes, while I am still able to move the mouse, type dmesg, etc..). To
trigger the partial i/o lockups I set the disk standby to 5 seconds.
After waking up the disk, I get numerous errors, and I have also gotten
an oops. Attempts to reproduce this with tcq off have failed so far. The
errors and oops are posted here:

http://www.ussg.iu.edu/hypermail/linux/kernel/0307.1/1682.html

I also get full system hangs like everybody else, but that doesn't
appear to be caused by tcq - have tested without it.

=============================================================================================

I am still keeping an old damaged reiser root filesystem, for the
purposes of testing. If there is interest in testing filesystem
corruption bugs, I am willing to do that. Please reply, though, because
I will eventually destroy that partition if there is no interest.

=============================================================================================
Finally, a comment on buffer-cache read speeds:
they're double what they used to be!
577.80 MB/sec vs 250-ish on 2.4.
That's great - I wondered what causes this improvement?
Thanks to all kernel developers.






^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: TCQ problems in 2.6.0-test1: the summary
  2003-07-19 22:37 TCQ problems in 2.6.0-test1: the summary Ivan Gyurdiev
@ 2003-07-21 12:33 ` Jens Axboe
  2003-07-21 15:58   ` Ivan Gyurdiev
  2003-07-21 16:21   ` David Ford
  0 siblings, 2 replies; 6+ messages in thread
From: Jens Axboe @ 2003-07-21 12:33 UTC (permalink / raw)
  To: Ivan Gyurdiev; +Cc: linux-kernel, Bartlomiej Zolnierkiewicz

On Sat, Jul 19 2003, Ivan Gyurdiev wrote:
> 2.6.0-test1-current.
> The TCQ bugs/problems that I have found in the kernel have not been
> addressed yet. Some of the things posted below are new, but most have
> been posted before, and there have been no replies. If those bug reports
> are invalid, please say so, and I will stop sending them.
> 
> ================================================================================
> 
> I own an IC35L080AVVA07-0 80 GB drive
> (IBM Desktar 120 GXP, which is supposed to support TCQ).
> TCQ will not be activated on boot unless TCQ is enabled by default.
> 
> The problems:
> 
> ======================================================================================
> 1) This patch by Jens Axboe makes my machine bootable with tcq enabled.
> It hasn't been included in the kernel yet.
> 
> http://www.ussg.iu.edu/hypermail/linux/kernel/0307.1/1006.html

I'll send that in.

> 2) The default for queue depth is commented as 32, but is in fact 8.

I'll fix that up, too.

> 3) This is described as a way to set tcq depth in the docs:
>  echo "using_tcq:32" > /proc/ide/hdX/settings
> 
> but it results in:  proc_ide_write_settings(): parse error
> (hdparm -Q works instead)

Huhm weird, someone has broking the proc parsing. I'll look into that.
-Q does the same thing, as you know.

> 4) Using a tcq-enabled kernel with queue depth of 8 results in
> massive filesystem corruption for me, verified under reiserfs, and xfs.
> Elevator choice does not appear to matter, while queue depth is
> important - I do not appear to get filesystem corruption with queue
> depth of 32. Reiser refuses to mount with such a kernel, and runs
> --fix-fixable at boot time. This is reproducible every time.

This is really strange. The only difference between using 8 or 32 tags
is when ide-disk stops attempting to queue. Are you getting any errors
in dmesg when this happens? Reading the start io path for this, it looks
correct to me. I'll have to try and reproduce when I get back.

> 5) Using a tcq-enabled kernel causes i/o lockups (disk read/write
> freezes, while I am still able to move the mouse, type dmesg, etc..). To
> trigger the partial i/o lockups I set the disk standby to 5 seconds.
> After waking up the disk, I get numerous errors, and I have also gotten
> an oops. Attempts to reproduce this with tcq off have failed so far. The
> errors and oops are posted here:
> 
> http://www.ussg.iu.edu/hypermail/linux/kernel/0307.1/1682.html

Noted, that is something I haven't tested.

> I am still keeping an old damaged reiser root filesystem, for the
> purposes of testing. If there is interest in testing filesystem
> corruption bugs, I am willing to do that. Please reply, though, because
> I will eventually destroy that partition if there is no interest.

If it's an ide tcq bug, it isn't very interesting. You can safely fry
that partition.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: TCQ problems in 2.6.0-test1: the summary
  2003-07-21 12:33 ` Jens Axboe
@ 2003-07-21 15:58   ` Ivan Gyurdiev
  2003-07-21 16:21   ` David Ford
  1 sibling, 0 replies; 6+ messages in thread
From: Ivan Gyurdiev @ 2003-07-21 15:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, Bartlomiej Zolnierkiewicz


> This is really strange. The only difference between using 8 or 32 tags
> is when ide-disk stops attempting to queue. Are you getting any errors
> in dmesg when this happens? Reading the start io path for this, it looks
> correct to me. I'll have to try and reproduce when I get back.

I get filesystem errors.
Well, reiserfs refuses to pass the filesystem check every time with a 
queue of depth 8. The one time that I decied to bypass it to look for 
errors, I got a bunch:

http://www.ussg.iu.edu/hypermail/linux/kernel/0307.1/1307.html

XFS will boot, but corrupts the fs after a while, and I got an oops there:

http://www.ussg.iu.edu/hypermail/linux/kernel/0307.1/2583.html

Other than that - no messages.


> If it's an ide tcq bug, it isn't very interesting. You can safely fry
> that partition.

Already done.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: TCQ problems in 2.6.0-test1: the summary
  2003-07-21 16:21   ` David Ford
@ 2003-07-21 16:10     ` Ivan Gyurdiev
  2003-07-22 11:36       ` David Ford
  0 siblings, 1 reply; 6+ messages in thread
From: Ivan Gyurdiev @ 2003-07-21 16:10 UTC (permalink / raw)
  To: David Ford; +Cc: linux-kernel


> Note, reiserfsck never indicates any problems were found or fixed but 
> the problems are none-the-less fixed.  (reiser guys: reiserfsck 
> --fix-fixable always results in "--fix-fixable ignored")

I think it does that when the root fs is mounted - not sure.
You should fsck from a different root.


> Jul 19 10:55:31 james hdc: invalidating tag queue (0 commands)
> Jul 19 10:55:41 james ide_tcq_intr_timeout: timeout waiting for 
> completion interrupt

Yes - that's in my original email.

> and further disk access causes D state.  I upgraded this machine to 
> 2.6.0-test1 and now it's having fits with apic or acpi but that's 
> another email.  And a side note, if I have TCQ compiled in w/ 
> 2.6.0-test1, the kernel barfs a long 40+ function OOPS on bootup.

Jens's patch in my email should fix that.
However, TCQ seems rather broken to me right now (or maybe it's just my
machine) - so I'd be careful with it.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: TCQ problems in 2.6.0-test1: the summary
  2003-07-21 12:33 ` Jens Axboe
  2003-07-21 15:58   ` Ivan Gyurdiev
@ 2003-07-21 16:21   ` David Ford
  2003-07-21 16:10     ` Ivan Gyurdiev
  1 sibling, 1 reply; 6+ messages in thread
From: David Ford @ 2003-07-21 16:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ivan Gyurdiev, linux-kernel, Bartlomiej Zolnierkiewicz, Hans Reiser


 > > 4) Using a tcq-enabled kernel with queue depth of 8 results in

>>massive filesystem corruption for me, verified under reiserfs, and xfs.
>>Elevator choice does not appear to matter, while queue depth is
>>important - I do not appear to get filesystem corruption with queue
>>depth of 32. Reiser refuses to mount with such a kernel, and runs
>>--fix-fixable at boot time. This is reproducible every time.
>>    
>>
>
>This is really strange. The only difference between using 8 or 32 tags
>is when ide-disk stops attempting to queue. Are you getting any errors
>in dmesg when this happens? Reading the start io path for this, it looks
>correct to me. I'll have to try and reproduce when I get back.
>
On my laptop:

Here is the only thing that is similar on my system.  When TCQ is 
enabled, I have filesystem problems (minimal) every time I reboot and it 
nearly always affects the same files every time.  I too am using 
reiserfs.  I usually run reiserfsck and emerge the particular group of 
files.  For a while it was openssl libraries, now it's kde libraries.

Note, reiserfsck never indicates any problems were found or fixed but 
the problems are none-the-less fixed.  (reiser guys: reiserfsck 
--fix-fixable always results in "--fix-fixable ignored")

Also note that there is never any indication ever that something is 
wacky.  Just out of the blue a file or files are corrupt and the bootup 
result is the same every time.

(~) # hdparm -I /dev/hda |head -n10

/dev/hda:

ATA device, with non-removable media
        Model Number:       IC25N030ATCS04-0                       
        Serial Number:      CSL305DAGVK71A
        Firmware Revision:  CA3OA72A


On one of my servers:

TCQ fscks it up bad.  It ran for over a month on .73 with nary an issue 
then all of a sudden it started barfing within hours of boot complaining 
about:

Jul 19 10:55:31 james hdc: invalidating tag queue (0 commands)
Jul 19 10:55:41 james ide_tcq_intr_timeout: timeout waiting for 
completion interrupt

and further disk access causes D state.  I upgraded this machine to 
2.6.0-test1 and now it's having fits with apic or acpi but that's 
another email.  And a side note, if I have TCQ compiled in w/ 
2.6.0-test1, the kernel barfs a long 40+ function OOPS on bootup.  It's 
a 24/7 server so I haven't put a serial cable on it to capture the oops 
yet :/

(~) # hdparm -I /dev/hdc

/dev/hdc:

ATA device, with non-removable media
        Model Number:       IBM-DPTA-372050                        
        Serial Number:      JMYJMFZ1555
        Firmware Revision:  P76OA30A

more if desired.

Other than this, I don't see any filesystem issues.

david



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: TCQ problems in 2.6.0-test1: the summary
  2003-07-21 16:10     ` Ivan Gyurdiev
@ 2003-07-22 11:36       ` David Ford
  0 siblings, 0 replies; 6+ messages in thread
From: David Ford @ 2003-07-22 11:36 UTC (permalink / raw)
  To: Ivan Gyurdiev; +Cc: linux-kernel

Ivan Gyurdiev wrote:

> > Note, reiserfsck never indicates any problems were found or fixed 
> but the problems are none-the-less fixed.  (reiser guys: reiserfsck 
> --fix-fixable always results in "--fix-fixable ignored")
>
> I think it does that when the root fs is mounted - not sure.
> You should fsck from a different root.


Right, let me just add a harddrive to my notebook ;)

> Jul 19 10:55:31 james hdc: invalidating tag queue (0 commands)
>
>> Jul 19 10:55:41 james ide_tcq_intr_timeout: timeout waiting for 
>> completion interrupt
>
>
> Yes - that's in my original email.
>
>> and further disk access causes D state.  I upgraded this machine to 
>> 2.6.0-test1 and now it's having fits with apic or acpi but that's 
>> another email.  And a side note, if I have TCQ compiled in w/ 
>> 2.6.0-test1, the kernel barfs a long 40+ function OOPS on bootup.
>
>
> Jens's patch in my email should fix that.
> However, TCQ seems rather broken to me right now (or maybe it's just my
> machine) - so I'd be careful with it.

noted.

I'm waiting for the next batch of updates, 2.6.0-test1 seems to be quite 
broken in several places for several of my machines. :/

david


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-07-22 11:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-19 22:37 TCQ problems in 2.6.0-test1: the summary Ivan Gyurdiev
2003-07-21 12:33 ` Jens Axboe
2003-07-21 15:58   ` Ivan Gyurdiev
2003-07-21 16:21   ` David Ford
2003-07-21 16:10     ` Ivan Gyurdiev
2003-07-22 11:36       ` David Ford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.