linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] ide dma_timer_expiry, then hard lockup
@ 2007-06-18 17:57 Linas Vepstas
  2007-06-18 18:11 ` Stuart_Hayes
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Linas Vepstas @ 2007-06-18 17:57 UTC (permalink / raw)
  To: linux-ide, linux-kernel



I've got a hard lockup in the ide subsystem, probably
due to some irq spew or something like that.

I've just bought a brand new Maxtor 320GB disk driver 
for the insane price of $70 US to replace another 
failing drive. It works well under light load;
I was able to copy about 60GB to it. However, 
under heavy load, such as reconstruction of an MD 
RAID-1 array, it'll lock up the kernel.  Which means
that my system won't boot :-(

I'm running 2.6.21.1, although the problem seems to occur 
in 2.6.19 and 2.6.18 too; its been there a while; I vageuly
remember similar problems in 2.6.5 or 2.6.10.

I get an 
"hdc: dma_timer_expiry: dma status == 0x21" 

and 10 seconds later,

"hdc: DMA Timeout error"

at which point the system is locked up hard.
Magic sysreq does not work at all. The hard drive activity light 
stays fully lit.  Inserting printk's into the kernel, I find the
hang to be in a surprising place: 


ide_dma_timeout_retry() in ide-io.c 
  prints the "hdc: DMA Timeout error" then calls
  HWIF(drive)->ide_dma_end(drive);
    which returns, and then calls 
  hwif->INB(IDE_STATUS_REG) which is needed as an argument to ide_error()

But this hangs! -- The INB never returns.
Now:  hwif->INB = ide_inb; in ide-iops.c

So putting a printk into ide_inb() shows that
the printk before the readb() is printed, and the
printk after the readb is not (!!)

I find this rather surpriseing, as I can't imagine how the
readb can fail. My current vague theory is that doing this
readb makes the hard drive go really nuts, and it probably
ties some interrupt line high, and so the linux kernel 
gets stuck trying to handle the irq flood. I just don't know
enough about the i386 architecture, or about interrupts, to 
prove or disprove this.

Background: this is on an old dual-cpu intel (coppermine??)
box; the controller is an HighPoint HPT366 on the motherboard.
This is an old parallel ATA (80-pin cable) setup.

I can get the system to boot by sneaking in an 
"hdparm -d0 /dev/hdc" early in the boot process, to turn off 
the use of DMA, but it seems that PIO is so slow, that it takes 
forever to get NFS started.

I can get it to boot, by unplugging /dev/hdc. Unfortunately,
given the RAID mirroring, the only usable copy of /, /usr is on
/dev/hda and te only usable copy of /home is on /dev/hdc, so 
I'm screwed ... 

Any suggestions, experiments, experimental patches, data gathering,
etc. is welcome. The sooner, the better... 

--linas




^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-18 17:57 [BUG] ide dma_timer_expiry, then hard lockup Linas Vepstas
@ 2007-06-18 18:11 ` Stuart_Hayes
  2007-06-19 14:07   ` Sergei Shtylyov
  2007-06-18 20:27 ` Alan Cox
  2007-06-22 15:39 ` Sergei Shtylyov
  2 siblings, 1 reply; 28+ messages in thread
From: Stuart_Hayes @ 2007-06-18 18:11 UTC (permalink / raw)
  To: linas, linux-ide, linux-kernel


I think reading the IDE status register clears the interrupt in the IDE
device, which might be causing the drive to think it's OK to generate
another interrupt.  This could either cause it to get stuck trying to
service an interrupt that is never getting cleared as you suggested, or
possibly when the next IRQ comes in the IDE IRQ handler gets stuck
waiting for a spinlock that the code you're looking at already owns...?

Perhaps a printk in the IDE IRQ handler would be informative?  It
wouldn't help you figure out how it got where it is, but it might help
you figure out why the system is hanging.

Stuart
 

-----Original Message-----
From: linux-ide-owner@vger.kernel.org
[mailto:linux-ide-owner@vger.kernel.org] On Behalf Of Linas Vepstas
Sent: Monday, June 18, 2007 12:57 PM
To: linux-ide@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: [BUG] ide dma_timer_expiry, then hard lockup



I've got a hard lockup in the ide subsystem, probably due to some irq
spew or something like that.

I've just bought a brand new Maxtor 320GB disk driver for the insane
price of $70 US to replace another failing drive. It works well under
light load; I was able to copy about 60GB to it. However, under heavy
load, such as reconstruction of an MD
RAID-1 array, it'll lock up the kernel.  Which means that my system
won't boot :-(

I'm running 2.6.21.1, although the problem seems to occur in 2.6.19 and
2.6.18 too; its been there a while; I vageuly remember similar problems
in 2.6.5 or 2.6.10.

I get an
"hdc: dma_timer_expiry: dma status == 0x21" 

and 10 seconds later,

"hdc: DMA Timeout error"

at which point the system is locked up hard.
Magic sysreq does not work at all. The hard drive activity light stays
fully lit.  Inserting printk's into the kernel, I find the hang to be in
a surprising place: 


ide_dma_timeout_retry() in ide-io.c 
  prints the "hdc: DMA Timeout error" then calls
  HWIF(drive)->ide_dma_end(drive);
    which returns, and then calls 
  hwif->INB(IDE_STATUS_REG) which is needed as an argument to
ide_error()

But this hangs! -- The INB never returns.
Now:  hwif->INB = ide_inb; in ide-iops.c

So putting a printk into ide_inb() shows that
the printk before the readb() is printed, and the
printk after the readb is not (!!)

I find this rather surpriseing, as I can't imagine how the
readb can fail. My current vague theory is that doing this
readb makes the hard drive go really nuts, and it probably
ties some interrupt line high, and so the linux kernel 
gets stuck trying to handle the irq flood. I just don't know
enough about the i386 architecture, or about interrupts, to 
prove or disprove this.

Background: this is on an old dual-cpu intel (coppermine??)
box; the controller is an HighPoint HPT366 on the motherboard.
This is an old parallel ATA (80-pin cable) setup.

I can get the system to boot by sneaking in an 
"hdparm -d0 /dev/hdc" early in the boot process, to turn off 
the use of DMA, but it seems that PIO is so slow, that it takes 
forever to get NFS started.

I can get it to boot, by unplugging /dev/hdc. Unfortunately,
given the RAID mirroring, the only usable copy of /, /usr is on
/dev/hda and te only usable copy of /home is on /dev/hdc, so 
I'm screwed ... 

Any suggestions, experiments, experimental patches, data gathering,
etc. is welcome. The sooner, the better... 

--linas



-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-18 17:57 [BUG] ide dma_timer_expiry, then hard lockup Linas Vepstas
  2007-06-18 18:11 ` Stuart_Hayes
@ 2007-06-18 20:27 ` Alan Cox
  2007-06-18 20:46   ` Linas Vepstas
  2007-06-22 15:39 ` Sergei Shtylyov
  2 siblings, 1 reply; 28+ messages in thread
From: Alan Cox @ 2007-06-18 20:27 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: linux-ide, linux-kernel

> ide_dma_timeout_retry() in ide-io.c 
>   prints the "hdc: DMA Timeout error" then calls
>   HWIF(drive)->ide_dma_end(drive);
>     which returns, and then calls 
>   hwif->INB(IDE_STATUS_REG) which is needed as an argument to ide_error()
> 
> But this hangs! -- The INB never returns.
> Now:  hwif->INB = ide_inb; in ide-iops.c

Yep and the I/O cycle never completes so the box hangs. This occurs if
the drive blows up and never switches IORDY to indicate completion. The
hpt will also do this sometimes if it gets addled by a confused drive,
while an intel one often won't.

Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-18 20:27 ` Alan Cox
@ 2007-06-18 20:46   ` Linas Vepstas
  2007-06-18 21:04     ` Alan Cox
  0 siblings, 1 reply; 28+ messages in thread
From: Linas Vepstas @ 2007-06-18 20:46 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-ide, linux-kernel

On Mon, Jun 18, 2007 at 09:27:04PM +0100, Alan Cox wrote:
> > ide_dma_timeout_retry() in ide-io.c 
> >   prints the "hdc: DMA Timeout error" then calls
> >   HWIF(drive)->ide_dma_end(drive);
> >     which returns, and then calls 
> >   hwif->INB(IDE_STATUS_REG) which is needed as an argument to ide_error()
> > 
> > But this hangs! -- The INB never returns.
> > Now:  hwif->INB = ide_inb; in ide-iops.c
> 
> Yep and the I/O cycle never completes so the box hangs. This occurs if
> the drive blows up and never switches IORDY to indicate completion. The
> hpt will also do this sometimes if it gets addled by a confused drive,
> while an intel one often won't.

So what do you suggest? (I could buy an alternate ide controller,
and hope that goes away, or just buy a different hard drive. But
that's beside the point).

I can prepare a patch, but only with a lot of guidance. I can test 
& debug, I'm highly motivated just right now ... 

--linas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-18 20:46   ` Linas Vepstas
@ 2007-06-18 21:04     ` Alan Cox
  2007-06-18 21:22       ` Linas Vepstas
  2007-06-19 14:10       ` Sergei Shtylyov
  0 siblings, 2 replies; 28+ messages in thread
From: Alan Cox @ 2007-06-18 21:04 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: linux-ide, linux-kernel

> So what do you suggest? (I could buy an alternate ide controller,
> and hope that goes away, or just buy a different hard drive. But
> that's beside the point).

The DMA timeout itself could be all sorts of things - crap driver, crap
hardware, PCI bus contention, noise, problem disk, phase of the moon.

> I can prepare a patch, but only with a lot of guidance. I can test 
> & debug, I'm highly motivated just right now ... 

If you've got a nice repeatable problem please try using the libata
driver. That handles the error paths differently and doesn't try a FIFO
drain which might matter in this case I guess.

Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-18 21:04     ` Alan Cox
@ 2007-06-18 21:22       ` Linas Vepstas
  2007-06-19 14:56         ` bug in libata [was " Linas Vepstas
  2007-06-19 14:10       ` Sergei Shtylyov
  1 sibling, 1 reply; 28+ messages in thread
From: Linas Vepstas @ 2007-06-18 21:22 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-ide, linux-kernel

On Mon, Jun 18, 2007 at 10:04:41PM +0100, Alan Cox wrote:
> 
> If you've got a nice repeatable problem 

Very highly repeatable :-(

> please try using the libata
> driver. That handles the error paths differently and doesn't try a FIFO
> drain which might matter in this case I guess.

Dohh, yes, of course. Completely forgot about that. (I assume you mean 
CONFIG_ATA).  Will report tommorrow.

--linas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-18 18:11 ` Stuart_Hayes
@ 2007-06-19 14:07   ` Sergei Shtylyov
  2007-06-19 15:05     ` Linas Vepstas
  0 siblings, 1 reply; 28+ messages in thread
From: Sergei Shtylyov @ 2007-06-19 14:07 UTC (permalink / raw)
  To: linas; +Cc: Stuart_Hayes, linux-ide, linux-kernel

Hello.

Stuart_Hayes@Dell.com wrote:
> I think reading the IDE status register clears the interrupt in the IDE
> device, which might be causing the drive to think it's OK to generate
> another interrupt.

    This is not how IDE drives are supposed to act -- they won't proceed any 
further until "interrupt pending" condition is cleared, so these aren't 
supposed to be "stacked". This behavior however is not strictly specified by 
ATA standards IIRC, but I can't readily imagine such situaltion anyway unless 
tagged command queueing  (which is not supported by IDE core) and/or ATAPI 
command overlapping is in action...

>  This could either cause it to get stuck trying to
> service an interrupt that is never getting cleared as you suggested, or
> possibly when the next IRQ comes in the IDE IRQ handler gets stuck
> waiting for a spinlock that the code you're looking at already owns...?

    I could also imagine the HPT366 chip going mad and stalling the reads if 
the taskfile regs forever because of the incomplete DMA or even the drive 
going mad and not replying to I/O cycles with proper -IORDY handshake (i.e. 
holding it low all the time)...

> Perhaps a printk in the IDE IRQ handler would be informative?  It
> wouldn't help you figure out how it got where it is, but it might help
> you figure out why the system is hanging.

> Stuart

> -----Original Message-----
> From: linux-ide-owner@vger.kernel.org
> [mailto:linux-ide-owner@vger.kernel.org] On Behalf Of Linas Vepstas
> Sent: Monday, June 18, 2007 12:57 PM
> To: linux-ide@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [BUG] ide dma_timer_expiry, then hard lockup

> I've got a hard lockup in the ide subsystem, probably due to some irq
> spew or something like that.
> 
> I've just bought a brand new Maxtor 320GB disk driver for the insane
> price of $70 US to replace another failing drive. It works well under
> light load; I was able to copy about 60GB to it. However, under heavy
> load, such as reconstruction of an MD
> RAID-1 array, it'll lock up the kernel.  Which means that my system
> won't boot :-(
> 
> I'm running 2.6.21.1, although the problem seems to occur in 2.6.19 and
> 2.6.18 too; its been there a while; I vageuly remember similar problems
> in 2.6.5 or 2.6.10.
> 
> I get an
> "hdc: dma_timer_expiry: dma status == 0x21" 

    This means "DMA not complete".

> and 10 seconds later,

    The above condition causes another, 10 sec timeout...

> "hdc: DMA Timeout error"

> at which point the system is locked up hard.
> Magic sysreq does not work at all. The hard drive activity light stays
> fully lit.  Inserting printk's into the kernel, I find the hang to be in
> a surprising place: 

> ide_dma_timeout_retry() in ide-io.c 
>   prints the "hdc: DMA Timeout error" then calls
>   HWIF(drive)->ide_dma_end(drive);
>     which returns, and then calls 
>   hwif->INB(IDE_STATUS_REG) which is needed as an argument to
> ide_error()

> But this hangs! -- The INB never returns.
> Now:  hwif->INB = ide_inb; in ide-iops.c

> So putting a printk into ide_inb() shows that
> the printk before the readb() is printed, and the
> printk after the readb is not (!!)

> I find this rather surpriseing, as I can't imagine how the
> readb can fail. My current vague theory is that doing this
> readb makes the hard drive go really nuts, and it probably

    As I said, this is not the only way how it all might have gone nuts... :-)

> ties some interrupt line high, and so the linux kernel 
> gets stuck trying to handle the irq flood. I just don't know
> enough about the i386 architecture, or about interrupts, to 
> prove or disprove this.

> Any suggestions, experiments, experimental patches, data gathering,
> etc. is welcome. The sooner, the better... 

> --linas

MBR, Sergei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-18 21:04     ` Alan Cox
  2007-06-18 21:22       ` Linas Vepstas
@ 2007-06-19 14:10       ` Sergei Shtylyov
  2007-06-19 14:19         ` Alan Cox
  1 sibling, 1 reply; 28+ messages in thread
From: Sergei Shtylyov @ 2007-06-19 14:10 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linas Vepstas, linux-ide, linux-kernel

Hello.

Alan Cox wrote:
>>I can prepare a patch, but only with a lot of guidance. I can test 
>>& debug, I'm highly motivated just right now ... 

> If you've got a nice repeatable problem please try using the libata
> driver. That handles the error paths differently and doesn't try a FIFO
> drain which might matter in this case I guess.

    FIFO drain for DMA commands?

MBR, Sergei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-19 14:10       ` Sergei Shtylyov
@ 2007-06-19 14:19         ` Alan Cox
  2007-06-19 14:24           ` Sergei Shtylyov
  0 siblings, 1 reply; 28+ messages in thread
From: Alan Cox @ 2007-06-19 14:19 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: Linas Vepstas, linux-ide, linux-kernel

On Tue, 19 Jun 2007 18:10:04 +0400
Sergei Shtylyov <sshtylyov@ru.mvista.com> wrote:

> Hello.
> 
> Alan Cox wrote:
> >>I can prepare a patch, but only with a lot of guidance. I can test 
> >>& debug, I'm highly motivated just right now ... 
> 
> > If you've got a nice repeatable problem please try using the libata
> > driver. That handles the error paths differently and doesn't try a FIFO
> > drain which might matter in this case I guess.
> 
>     FIFO drain for DMA commands?

Welcome to the old IDE layer which I am so glad I left behind 8)

ide_ata_error will try and do a PIO flush regardless of the command type
if DRQ_STAT is asserted. See ide_dma_intr -> ide_error -> ...

Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-19 14:19         ` Alan Cox
@ 2007-06-19 14:24           ` Sergei Shtylyov
  2007-06-19 15:38             ` Mark Lord
  0 siblings, 1 reply; 28+ messages in thread
From: Sergei Shtylyov @ 2007-06-19 14:24 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linas Vepstas, linux-ide, linux-kernel

Alan Cox wrote:

>>>>I can prepare a patch, but only with a lot of guidance. I can test 
>>>>& debug, I'm highly motivated just right now ... 

>>>If you've got a nice repeatable problem please try using the libata
>>>driver. That handles the error paths differently and doesn't try a FIFO
>>>drain which might matter in this case I guess.

>>    FIFO drain for DMA commands?

> Welcome to the old IDE layer which I am so glad I left behind 8)

> ide_ata_error will try and do a PIO flush regardless of the command type
> if DRQ_STAT is asserted. See ide_dma_intr -> ide_error -> ...

    Indeed... but the thing is we don't know what's asserted in this case -- 
remember, it's reading the status register that locks everything up...

> Alan

MBR, Sergei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug in libata [was Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-18 21:22       ` Linas Vepstas
@ 2007-06-19 14:56         ` Linas Vepstas
  0 siblings, 0 replies; 28+ messages in thread
From: Linas Vepstas @ 2007-06-19 14:56 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-ide, linux-kernel, linux

On Mon, Jun 18, 2007 at 04:22:38PM -0500, linas wrote:
> On Mon, Jun 18, 2007 at 10:04:41PM +0100, Alan Cox wrote:
> > please try using the libata
> > driver. 

Its worse. I get a hard hang (sysrq doesn't work) during boot, 
just when the system goes to read the partition table.

Recap: this is an older dual cpu intel box, with a vintage
HTP366 (and not a newer HPT370) on the system planar.  I'm
testing a configration that works fine with the old ide
drivers.

It looks like libata and scsi comes up. The disk is correctly
recognized; i.e. its brand, model number & size are correctly 
reported. printk shows that it hangs in msdos_partition, trying 
to read the partition table. The drive light is on full-solid,
again suggesting a possible irq storm.

Same behaviour for both 2.6.22-rc5-git1 and for 2.6.22-rc4-mm2

I seem to have trouble turning on scsi logging; it doesn't
seem to generate any output ... 

Any suggestions on how to proceed? 

--linas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-19 14:07   ` Sergei Shtylyov
@ 2007-06-19 15:05     ` Linas Vepstas
  2007-06-19 16:10       ` Sergei Shtylyov
  0 siblings, 1 reply; 28+ messages in thread
From: Linas Vepstas @ 2007-06-19 15:05 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: Stuart_Hayes, linux-ide, linux-kernel

Hi Sergei,

On Tue, Jun 19, 2007 at 06:07:07PM +0400, Sergei Shtylyov wrote:
> 
> Stuart_Hayes@Dell.com wrote:
> >I think reading the IDE status register clears the interrupt in the IDE
> >device, which might be causing the drive to think it's OK to generate
> >another interrupt.
> 
>    This is not how IDE drives are supposed to act -- they won't proceed any 
> further until "interrupt pending" condition is cleared, so these aren't 
> supposed to be "stacked". This behavior however is not strictly specified 
> by ATA standards IIRC, but I can't readily imagine such situaltion anyway 
> unless tagged command queueing  (which is not supported by IDE core) and/or 
> ATAPI command overlapping is in action...

The problem only manifests during high io load; perhaps a missing mutex
somewhere is blasting one thing too many out to the hard drive?

> > This could either cause it to get stuck trying to
> >service an interrupt that is never getting cleared as you suggested, or
> >possibly when the next IRQ comes in the IDE IRQ handler gets stuck
> >waiting for a spinlock that the code you're looking at already owns...?
> 
>    I could also imagine the HPT366 chip going mad and stalling the reads if 
> the taskfile regs forever because of the incomplete DMA or even the drive 
> going mad and not replying to I/O cycles with proper -IORDY handshake (i.e. 
> holding it low all the time)...

In my case, ctrl-alt-sysrq doesn't work, which makes it hard to debug.

I'm thinking that trying to debug libata is a better idea, rather than
investing time in ide, right?  Although at the moment, libata works even 
less; see other email.

--linas


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-19 14:24           ` Sergei Shtylyov
@ 2007-06-19 15:38             ` Mark Lord
  2007-06-19 15:51               ` Sergei Shtylyov
  2007-06-19 16:17               ` Alan Cox
  0 siblings, 2 replies; 28+ messages in thread
From: Mark Lord @ 2007-06-19 15:38 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: Alan Cox, Linas Vepstas, linux-ide, linux-kernel

Sergei Shtylyov wrote:
> Alan Cox wrote:
> 
>>>>> I can prepare a patch, but only with a lot of guidance. I can test 
>>>>> & debug, I'm highly motivated just right now ... 
> 
>>>> If you've got a nice repeatable problem please try using the libata
>>>> driver. That handles the error paths differently and doesn't try a FIFO
>>>> drain which might matter in this case I guess.
> 
>>>    FIFO drain for DMA commands?
> 
>> Welcome to the old IDE layer which I am so glad I left behind 8)
> 
>> ide_ata_error will try and do a PIO flush regardless of the command type
>> if DRQ_STAT is asserted. See ide_dma_intr -> ide_error -> ...
> 
>    Indeed... but the thing is we don't know what's asserted in this case 
> -- remember, it's reading the status register that locks everything up...

Exactly.  And IORDY shouldn't really apply there,
unless some nitwit standards person wrote it into a spec..

-ml

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-19 15:38             ` Mark Lord
@ 2007-06-19 15:51               ` Sergei Shtylyov
  2007-06-19 16:17               ` Alan Cox
  1 sibling, 0 replies; 28+ messages in thread
From: Sergei Shtylyov @ 2007-06-19 15:51 UTC (permalink / raw)
  To: Mark Lord; +Cc: Alan Cox, Linas Vepstas, linux-ide, linux-kernel

Mark Lord wrote:

>>>>>> I can prepare a patch, but only with a lot of guidance. I can test 
>>>>>> & debug, I'm highly motivated just right now ... 

>>>>> If you've got a nice repeatable problem please try using the libata
>>>>> driver. That handles the error paths differently and doesn't try a 
>>>>> FIFO
>>>>> drain which might matter in this case I guess.

>>>>    FIFO drain for DMA commands?

>>> Welcome to the old IDE layer which I am so glad I left behind 8)

>>> ide_ata_error will try and do a PIO flush regardless of the command type
>>> if DRQ_STAT is asserted. See ide_dma_intr -> ide_error -> ...

>>    Indeed... but the thing is we don't know what's asserted in this 
>> case -- remember, it's reading the status register that locks 
>> everything up...

> Exactly.  And IORDY shouldn't really apply there,
> unless some nitwit standards person wrote it into a spec..

    Wrote what? IORDY throttling does *apply* to both data and non-data 
register accesses, of course.

> -ml

MBR, Sergei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-19 15:05     ` Linas Vepstas
@ 2007-06-19 16:10       ` Sergei Shtylyov
  2007-06-19 16:48         ` Linas Vepstas
  0 siblings, 1 reply; 28+ messages in thread
From: Sergei Shtylyov @ 2007-06-19 16:10 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: Stuart_Hayes, linux-ide, linux-kernel

Hello.

Linas Vepstas wrote:

>>Stuart_Hayes@Dell.com wrote:

>>>I think reading the IDE status register clears the interrupt in the IDE
>>>device, which might be causing the drive to think it's OK to generate
>>>another interrupt.

>>   This is not how IDE drives are supposed to act -- they won't proceed any 
>>further until "interrupt pending" condition is cleared, so these aren't 
>>supposed to be "stacked". This behavior however is not strictly specified 
>>by ATA standards IIRC, but I can't readily imagine such situaltion anyway 
>>unless tagged command queueing  (which is not supported by IDE core) and/or 
>>ATAPI command overlapping is in action...

> The problem only manifests during high io load; perhaps a missing mutex
> somewhere is blasting one thing too many out to the hard drive?

    Hm... not sure about this.

>>>This could either cause it to get stuck trying to
>>>service an interrupt that is never getting cleared as you suggested, or
>>>possibly when the next IRQ comes in the IDE IRQ handler gets stuck
>>>waiting for a spinlock that the code you're looking at already owns...?

>>   I could also imagine the HPT366 chip going mad and stalling the reads if 
>>the taskfile regs forever because of the incomplete DMA or even the drive 
>>going mad and not replying to I/O cycles with proper -IORDY handshake (i.e. 
>>holding it low all the time)...

> In my case, ctrl-alt-sysrq doesn't work, which makes it hard to debug.

> I'm thinking that trying to debug libata is a better idea, rather than
> investing time in ide, right?  Although at the moment, libata works even 
> less; see other email.

    Which makes me think this really is some *hardware* issue.

> --linas

MBR, Sergei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-19 15:38             ` Mark Lord
  2007-06-19 15:51               ` Sergei Shtylyov
@ 2007-06-19 16:17               ` Alan Cox
  2007-06-19 16:32                 ` Sergei Shtylyov
  1 sibling, 1 reply; 28+ messages in thread
From: Alan Cox @ 2007-06-19 16:17 UTC (permalink / raw)
  To: Mark Lord; +Cc: Sergei Shtylyov, Linas Vepstas, linux-ide, linux-kernel

> >    Indeed... but the thing is we don't know what's asserted in this case 
> > -- remember, it's reading the status register that locks everything up...
> 
> Exactly.  And IORDY shouldn't really apply there,
> unless some nitwit standards person wrote it into a spec..

Could it be we need to reset the state machine at this point before we
touch the registers again - that wouldn't be the first controller with
this limit and undocumented.

On the 370 we already 

Linas; For the debug on the libata one turn on ATA_DEBUG and
ATA_VERBOSE_DEBUG in include/linux/libata.h and it should spew
diagnostics before the freeze. I suspect thats a different problem to the
hang you see now but I'd like to debug both.

Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-19 16:17               ` Alan Cox
@ 2007-06-19 16:32                 ` Sergei Shtylyov
  0 siblings, 0 replies; 28+ messages in thread
From: Sergei Shtylyov @ 2007-06-19 16:32 UTC (permalink / raw)
  To: Alan Cox; +Cc: Mark Lord, Linas Vepstas, linux-ide, linux-kernel

Alan Cox wrote:
>>>   Indeed... but the thing is we don't know what's asserted in this case 
>>>-- remember, it's reading the status register that locks everything up...

>>Exactly.  And IORDY shouldn't really apply there,
>>unless some nitwit standards person wrote it into a spec..

> Could it be we need to reset the state machine at this point before we
> touch the registers again - that wouldn't be the first controller with
> this limit and undocumented.

> On the 370 we already 

    Yeah, that could be. And because IORDY pin becomes DSTROBE for UltraDMA it 
might have stuck low due to this (if the chip never asserted STOP)...

> Alan

MBR, Sergei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-19 16:10       ` Sergei Shtylyov
@ 2007-06-19 16:48         ` Linas Vepstas
  2007-06-19 18:43           ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 28+ messages in thread
From: Linas Vepstas @ 2007-06-19 16:48 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: Stuart_Hayes, linux-ide, linux-kernel

On Tue, Jun 19, 2007 at 08:10:25PM +0400, Sergei Shtylyov wrote:
> 
> >I'm thinking that trying to debug libata is a better idea, rather than
> >investing time in ide, right?  Although at the moment, libata works even 
> >less; see other email.
> 
>    Which makes me think this really is some *hardware* issue.

There are two distinct issues.
-- libata locks up in partition table read on an hpt366+old maxtor disk
   that has ben working fine for many years with old ide driver. (It
   still works fine when I boot to the alternate ide-based kernel).

-- ide driver locks up on hpt366+new maxtor disk under heavy 
   i/o load. I was able to copy 60GB from old to new disk without a
   problem; however, raid reconstruction locks it up, maybe after 5-15
   seconds.

   This probably is "hardware related"; its something that the new 
   hard drive does. Given that its being sold at a big discount, it
   may even be that the sellers know that this is a crappy disk. :-)

   All I want is some way of resetting the disk, and continuing on.

I'm stalled in debugging; I'm not sue what I'm looking for.

--linas



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-19 16:48         ` Linas Vepstas
@ 2007-06-19 18:43           ` Bartlomiej Zolnierkiewicz
  2007-06-19 20:07             ` Sergei Shtylyov
  0 siblings, 1 reply; 28+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2007-06-19 18:43 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: Sergei Shtylyov, Stuart_Hayes, linux-ide, linux-kernel


Hi,

On Tuesday 19 June 2007, Linas Vepstas wrote:
> On Tue, Jun 19, 2007 at 08:10:25PM +0400, Sergei Shtylyov wrote:
> > 
> > >I'm thinking that trying to debug libata is a better idea, rather than
> > >investing time in ide, right?  Although at the moment, libata works even 
> > >less; see other email.
> > 
> >    Which makes me think this really is some *hardware* issue.

Linas, have you checked that there are no firmware updates available
for this drive?

> There are two distinct issues.
> -- libata locks up in partition table read on an hpt366+old maxtor disk
>    that has ben working fine for many years with old ide driver. (It
>    still works fine when I boot to the alternate ide-based kernel).
> 
> -- ide driver locks up on hpt366+new maxtor disk under heavy 
>    i/o load. I was able to copy 60GB from old to new disk without a
>    problem; however, raid reconstruction locks it up, maybe after 5-15
>    seconds.
> 
>    This probably is "hardware related"; its something that the new 
>    hard drive does. Given that its being sold at a big discount, it
>    may even be that the sellers know that this is a crappy disk. :-)
> 
>    All I want is some way of resetting the disk, and continuing on.

It would be useful to see hdparm --Istdout output for *both* disks.

> I'm stalled in debugging; I'm not sue what I'm looking for.

Sergei, do you think that testing the drive with DMA disabled may
tell us something new?

Thanks,
Bart

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-19 18:43           ` Bartlomiej Zolnierkiewicz
@ 2007-06-19 20:07             ` Sergei Shtylyov
  2007-06-20 16:28               ` Linas Vepstas
  0 siblings, 1 reply; 28+ messages in thread
From: Sergei Shtylyov @ 2007-06-19 20:07 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Linas Vepstas, Stuart_Hayes, linux-ide, linux-kernel

Bartlomiej Zolnierkiewicz wrote:

>>There are two distinct issues.
>>-- libata locks up in partition table read on an hpt366+old maxtor disk
>>   that has ben working fine for many years with old ide driver. (It
>>   still works fine when I boot to the alternate ide-based kernel).

>>-- ide driver locks up on hpt366+new maxtor disk under heavy 
>>   i/o load. I was able to copy 60GB from old to new disk without a
>>   problem; however, raid reconstruction locks it up, maybe after 5-15
>>   seconds.

>>   This probably is "hardware related"; its something that the new 
>>   hard drive does. Given that its being sold at a big discount, it
>>   may even be that the sellers know that this is a crappy disk. :-)

>>   All I want is some way of resetting the disk, and continuing on.

> It would be useful to see hdparm --Istdout output for *both* disks.

>>I'm stalled in debugging; I'm not sue what I'm looking for.

> Sergei, do you think that testing the drive with DMA disabled may
> tell us something new?

    Not sure. I'll try to come up with a patch esetting the state machine in 
dma_timeout() method (following Alan's idea) -- HPT366 regs are different 
enough to use the one for HPT370.

> Thanks,
> Bart

MBR, Sergei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-19 20:07             ` Sergei Shtylyov
@ 2007-06-20 16:28               ` Linas Vepstas
  2007-06-20 17:01                 ` Alan Cox
  0 siblings, 1 reply; 28+ messages in thread
From: Linas Vepstas @ 2007-06-20 16:28 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Bartlomiej Zolnierkiewicz, Stuart_Hayes, linux-ide, linux-kernel

On Wed, Jun 20, 2007 at 12:07:19AM +0400, Sergei Shtylyov wrote:
> Bartlomiej Zolnierkiewicz wrote:
> 
> [...frmware...]

Google seems to show that there is no publically available 
firmware updates for Maxtor disks.

> >It would be useful to see hdparm --Istdout output for *both* disks.

Lets do one at a time. Appended below is the one for the 
older, "known good" disk.

> >Sergei, do you think that testing the drive with DMA disabled may
> >tell us something new?

FWIW, the "buggy" disk seems to work fine with DMA turned off (with
hdparm). I just copied 60GB from it; although this did take about 16
hours at high cpu usage.... There were maybe a a dozen DriveReady
SeekComplete Timeout errors clustered a few minutes apart.  

----
Re: The libata problem. This is a hang during the read of the 
partition table during boot, of the "known good" disk.  I turned
on scsi and libata debugging, reproduced the hang, dilligently
copied to a piece of paper, but then left the darned piece of 
paper at home. 

>From what I remember, the ata command was translated to scsi,
by ata_queuecommand, and then handed off to the scsi subsystem. 
Presumably, its sent to the drive, but the drive does not respond.

30 seconds later, the scsi eh runs, and ands the error back to 
libata, which takes a few ineffectual shots at recovery, and 
then hangs. 

I'll try to get the details later.

Is there a way of viewing the contents of he command queue on
the hard drive, to see if the command actually made it across?

--linas

/dev/hda:
 multcount    = 16 (on)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 256 (on)
 geometry     = 24792/255/63, sectors = 398297088, start = 0
0040 3fff c837 0010 0000 0000 003f 0000
0000 0000 5936 3130 4d45 4345 2020 2020
2020 2020 2020 2020 0003 3e00 0039 5941
5234 3142 5730 4d61 7874 6f72 2036 5932
3030 5030 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8010
0000 2f00 4000 0200 0000 0007 ffff 0001
003f ffc1 003e 0110 ffff 0fff 0000 0007
0003 0078 0078 0078 0078 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
00fe 001e 7c6b 7f09 4003 7c69 3e01 4003
107f 0000 0000 0000 fffe 600d c0fe 0000
0000 0000 0000 0000 8800 17bd 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0001 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0001 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 60a5


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-20 16:28               ` Linas Vepstas
@ 2007-06-20 17:01                 ` Alan Cox
  2007-06-21 17:58                   ` Sergei Shtylyov
  2007-06-21 19:47                   ` Linas Vepstas
  0 siblings, 2 replies; 28+ messages in thread
From: Alan Cox @ 2007-06-20 17:01 UTC (permalink / raw)
  To: Linas Vepstas
  Cc: Sergei Shtylyov, Bartlomiej Zolnierkiewicz, Stuart_Hayes,
	linux-ide, linux-kernel

> Google seems to show that there is no publically available 
> firmware updates for Maxtor disks.

There are for some but only if you irritate the tech support people.

> hours at high cpu usage.... There were maybe a a dozen DriveReady
> SeekComplete Timeout errors clustered a few minutes apart.  

That suggests the drive is having problems occassionally and that the DMA
path code then blows up when they occur.

> Is there a way of viewing the contents of he command queue on
> the hard drive, to see if the command actually made it across?

queue ? You are overestimating IDE ;)

When the command is written you wait 400nS and then BSY is supposed to be
asserted. DRQ and other bits then handshake the data at the software
level with IORDY doing it at the hardware level for PIO (except in early
drives/low speeds where its done by the prayer and timing tolerance
approach)

Its unlikely the command got lost. The IRQ could have done but the error
path tries to spot that case by reading the status register - which
hangs. So in theory it could be a lost IRQ and if the reset works we'll
find that out.

Alan



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-20 17:01                 ` Alan Cox
@ 2007-06-21 17:58                   ` Sergei Shtylyov
  2007-06-21 21:41                     ` Alan Cox
  2007-06-21 19:47                   ` Linas Vepstas
  1 sibling, 1 reply; 28+ messages in thread
From: Sergei Shtylyov @ 2007-06-21 17:58 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linas Vepstas, Bartlomiej Zolnierkiewicz, Stuart_Hayes,
	linux-ide, linux-kernel

Hello.

Alan Cox wrote:

>>Google seems to show that there is no publically available 
>>firmware updates for Maxtor disks.

> There are for some but only if you irritate the tech support people.

>>hours at high cpu usage.... There were maybe a a dozen DriveReady
>>SeekComplete Timeout errors clustered a few minutes apart.  

> That suggests the drive is having problems occassionally and that the DMA
> path code then blows up when they occur.

>>Is there a way of viewing the contents of he command queue on
>>the hard drive, to see if the command actually made it across?

> queue ? You are overestimating IDE ;)

    He's not -- there is queued commands support since ATA[PI]-5. I'm not sure 
why but Linux decided not to support it.

MBR, Sergei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-20 17:01                 ` Alan Cox
  2007-06-21 17:58                   ` Sergei Shtylyov
@ 2007-06-21 19:47                   ` Linas Vepstas
  2007-06-21 22:04                     ` Alan Cox
  1 sibling, 1 reply; 28+ messages in thread
From: Linas Vepstas @ 2007-06-21 19:47 UTC (permalink / raw)
  To: Alan Cox
  Cc: Sergei Shtylyov, Bartlomiej Zolnierkiewicz, Stuart_Hayes,
	linux-ide, linux-kernel

On Wed, Jun 20, 2007 at 06:01:23PM +0100, Alan Cox wrote:
> 
> Its unlikely the command got lost. The IRQ could have done but the error
> path tries to spot that case by reading the status register - which
> hangs. So in theory it could be a lost IRQ and if the reset works we'll
> find that out.

OK, here's the libata trace info (transcribed by hand, may have typos,
the numerical values should be correct).

This is during the first read of the partition table, during boot.

ata_scsi_dumb_cb: CDB(:1:0,0,0) 28 00 00 00 00 00 00 00 08
ata_scsi_translate: ENTER
scsi_10_lba_len: ten-byte command
ata_sg_setup: ENTER, ata1
ata_sg_setup: 1 sg elements mapped
ata_fill_sg: PRD[0] = (0x2FEEF000, 0x1000)
ata1: ata_dev_select: ENTER, device 0, wait 1
ata_tf_load: feat 0x0 nsect 0x8 lba 0x0 0x0 0x0
ata_tf_load: device 0xE0
ata_exec_command: ta1: cmd 0xc8
ata_scsi_translate: EXIT

then, 30 seconds later:

sd 0:0:0:0 [sda] Done: 0xeff3aba0 TIMEOUT
sd 0:0:0:0 [sda] Result: host_byte=DID_OK driver_byte=DRV_OK, SUG_OK
sd 0:0:0:0 [sda] CDB: Read(10): 28 00 00 ... 00 08 00
sd 0:0:0:0 [sda] scsi host busy 1 failed 0
ata_scsi_timed_out: ENTER
ata_scsi_timed_out: EXIT, ret=0
ata_port_flush_task: ENTER
ata_port_flush_task: flush #1
ata1: ata_port_flush_task: flush #2
ata_port_flush_task: EXIT

Then a hard hang here.

This was on 2.6.22-rc5-git1
Again, this disk and controller combo work spotlessly when using 
the ide drivers.


--linas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-21 17:58                   ` Sergei Shtylyov
@ 2007-06-21 21:41                     ` Alan Cox
  0 siblings, 0 replies; 28+ messages in thread
From: Alan Cox @ 2007-06-21 21:41 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Linas Vepstas, Bartlomiej Zolnierkiewicz, Stuart_Hayes,
	linux-ide, linux-kernel

> > queue ? You are overestimating IDE ;)
> 
>     He's not -- there is queued commands support since ATA[PI]-5. I'm not sure 
> why but Linux decided not to support it.

Almost no hardware supports it and the functionality is really really
ugly to use when it works at all - NCQ is rather more elegant.

Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-21 19:47                   ` Linas Vepstas
@ 2007-06-21 22:04                     ` Alan Cox
  0 siblings, 0 replies; 28+ messages in thread
From: Alan Cox @ 2007-06-21 22:04 UTC (permalink / raw)
  To: Linas Vepstas
  Cc: Sergei Shtylyov, Bartlomiej Zolnierkiewicz, Stuart_Hayes,
	linux-ide, linux-kernel

> sd 0:0:0:0 [sda] Done: 0xeff3aba0 TIMEOUT
> sd 0:0:0:0 [sda] Result: host_byte=DID_OK driver_byte=DRV_OK, SUG_OK
> sd 0:0:0:0 [sda] CDB: Read(10): 28 00 00 ... 00 08 00
> sd 0:0:0:0 [sda] scsi host busy 1 failed 0
> ata_scsi_timed_out: ENTER
> ata_scsi_timed_out: EXIT, ret=0
> ata_port_flush_task: ENTER
> ata_port_flush_task: flush #1
> ata1: ata_port_flush_task: flush #2
> ata_port_flush_task: EXIT
> 
> Then a hard hang here.

Thanks 

Added to my bug collection to peer at.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-18 17:57 [BUG] ide dma_timer_expiry, then hard lockup Linas Vepstas
  2007-06-18 18:11 ` Stuart_Hayes
  2007-06-18 20:27 ` Alan Cox
@ 2007-06-22 15:39 ` Sergei Shtylyov
  2007-06-29 18:52   ` Sergei Shtylyov
  2 siblings, 1 reply; 28+ messages in thread
From: Sergei Shtylyov @ 2007-06-22 15:39 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: linux-ide, linux-kernel

Linas Vepstas wrote:

> I've got a hard lockup in the ide subsystem, probably
> due to some irq spew or something like that.

> I've just bought a brand new Maxtor 320GB disk driver 
> for the insane price of $70 US to replace another 
> failing drive. It works well under light load;
> I was able to copy about 60GB to it. However, 
> under heavy load, such as reconstruction of an MD 
> RAID-1 array, it'll lock up the kernel.  Which means
> that my system won't boot :-(

> I'm running 2.6.21.1, although the problem seems to occur 
> in 2.6.19 and 2.6.18 too; its been there a while; I vageuly
> remember similar problems in 2.6.5 or 2.6.10.

    Ah... so you're saying that the old disk works OK, yet you've already 
observed alike behavior on other disks... That speaks against blacklisting the 
drive.

> I can get the system to boot by sneaking in an 
> "hdparm -d0 /dev/hdc" early in the boot process, to turn off 

    Could probably do the same trick and specify the lower DMA speed by using 
-X n option (where n ranges from 64 to 70 for UltraDMA modes 0 to 6) and see 
if it changes anything...

> the use of DMA, but it seems that PIO is so slow, that it takes 
> forever to get NFS started.

    You can use 'ide=nodma' kernel option for this.

MBR, Sergei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] ide dma_timer_expiry, then hard lockup
  2007-06-22 15:39 ` Sergei Shtylyov
@ 2007-06-29 18:52   ` Sergei Shtylyov
  0 siblings, 0 replies; 28+ messages in thread
From: Sergei Shtylyov @ 2007-06-29 18:52 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: linux-ide, linux-kernel

Hello, I wrote:

>> I've got a hard lockup in the ide subsystem, probably
>> due to some irq spew or something like that.

>> I've just bought a brand new Maxtor 320GB disk driver for the insane 
>> price of $70 US to replace another failing drive. It works well under 
>> light load;
>> I was able to copy about 60GB to it. However, under heavy load, such 
>> as reconstruction of an MD RAID-1 array, it'll lock up the kernel.  
>> Which means
>> that my system won't boot :-(

>> I'm running 2.6.21.1, although the problem seems to occur in 2.6.19 
>> and 2.6.18 too; its been there a while; I vageuly
>> remember similar problems in 2.6.5 or 2.6.10.

>    Ah... so you're saying that the old disk works OK, yet you've already 
> observed alike behavior on other disks... That speaks against 
> blacklisting the drive.

    I'm going to shortly post the patches blacklisting the drive and using the 
proper HPT36x enablebits (as much as this stupid design may be fixed :-)...
Please test.

>> I can get the system to boot by sneaking in an "hdparm -d0 /dev/hdc" 
>> early in the boot process, to turn off 

>    Could probably do the same trick and specify the lower DMA speed by 
> using -X n option (where n ranges from 64 to 70 for UltraDMA modes 0 to 
> 6) and see if it changes anything...

    Note that for the hpt366.c driver, this is also achievable by setting 
HPT366_ALLOW_ATA66_[34] to 0 and recompiling. It will limit its UltraDMA 
capabilities to modes 2 or 3.

MBR, Sergei

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2007-06-29 18:51 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-18 17:57 [BUG] ide dma_timer_expiry, then hard lockup Linas Vepstas
2007-06-18 18:11 ` Stuart_Hayes
2007-06-19 14:07   ` Sergei Shtylyov
2007-06-19 15:05     ` Linas Vepstas
2007-06-19 16:10       ` Sergei Shtylyov
2007-06-19 16:48         ` Linas Vepstas
2007-06-19 18:43           ` Bartlomiej Zolnierkiewicz
2007-06-19 20:07             ` Sergei Shtylyov
2007-06-20 16:28               ` Linas Vepstas
2007-06-20 17:01                 ` Alan Cox
2007-06-21 17:58                   ` Sergei Shtylyov
2007-06-21 21:41                     ` Alan Cox
2007-06-21 19:47                   ` Linas Vepstas
2007-06-21 22:04                     ` Alan Cox
2007-06-18 20:27 ` Alan Cox
2007-06-18 20:46   ` Linas Vepstas
2007-06-18 21:04     ` Alan Cox
2007-06-18 21:22       ` Linas Vepstas
2007-06-19 14:56         ` bug in libata [was " Linas Vepstas
2007-06-19 14:10       ` Sergei Shtylyov
2007-06-19 14:19         ` Alan Cox
2007-06-19 14:24           ` Sergei Shtylyov
2007-06-19 15:38             ` Mark Lord
2007-06-19 15:51               ` Sergei Shtylyov
2007-06-19 16:17               ` Alan Cox
2007-06-19 16:32                 ` Sergei Shtylyov
2007-06-22 15:39 ` Sergei Shtylyov
2007-06-29 18:52   ` Sergei Shtylyov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).