linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ide-cd problem
@ 2004-11-20 18:42 Alan Chandler
  2004-11-20 19:47 ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-11-20 18:42 UTC (permalink / raw)
  To: linux-kernel, axboe

I have been trying to track down why all attempts to burn a cd on my ide cdrw 
fails (see bug #3741 on bugzilla ), with a subprocess of cdrecord ending up 
hanging in uninterruptable sleep state.

I think I understand what is happening, I just don't know what to do about it.

Inside drivers/ide/ide-cd.c

the ide_do_rw_cdrom routine has been called via a request with the request 
flags having the REQ_BLOCK_PC flag set.  The request->data_len of this 
request is set to 0.

This request is sent to the device and it generates interrupts to eventually 
land it up inside the routine cdrom_newpc_intr.

At this point the status register on the hardware is set to 0x58 - implying, I 
think that the DRQ_STAT bit is set and that something should be sent to the 
device.

Normally, because the requested data_len is not zero, the data is sent.  In 
this case however, because the original request had nothing to send, the 
while/if clauses to initiate a new transfer are skipped and the routine ends 
up setting a new interrupt handler address and returning to await an 
interrupt that will never come.

Question: should something validate that the request length is not zero 
earlier, or should there be a check in ide-cd.c, or is it my hardware (its a 
generic cd read/rewriter which announces itself as 'CW078D CD-R/RW')
-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-20 18:42 ide-cd problem Alan Chandler
@ 2004-11-20 19:47 ` Jens Axboe
  2004-11-21  0:53   ` Alan Chandler
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2004-11-20 19:47 UTC (permalink / raw)
  To: Alan Chandler; +Cc: linux-kernel

On Sat, Nov 20 2004, Alan Chandler wrote:
> I have been trying to track down why all attempts to burn a cd on my
> ide cdrw fails (see bug #3741 on bugzilla ), with a subprocess of
> cdrecord ending up hanging in uninterruptable sleep state.
> 
> I think I understand what is happening, I just don't know what to do
> about it.
> 
> Inside drivers/ide/ide-cd.c
> 
> the ide_do_rw_cdrom routine has been called via a request with the
> request flags having the REQ_BLOCK_PC flag set.  The request->data_len
> of this request is set to 0.
> 
> This request is sent to the device and it generates interrupts to
> eventually land it up inside the routine cdrom_newpc_intr.
> 
> At this point the status register on the hardware is set to 0x58 -
> implying, I think that the DRQ_STAT bit is set and that something
> should be sent to the device.
> 
> Normally, because the requested data_len is not zero, the data is
> sent.  In this case however, because the original request had nothing
> to send, the while/if clauses to initiate a new transfer are skipped
> and the routine ends up setting a new interrupt handler address and
> returning to await an interrupt that will never come.

The big question is - what does the original command look like? Just
dumping rq->cmd[0] would be a big help, but really just put code in
sg_io() in block/scsi_ioctl.c to dump the completed sg_io_hdr_t and send
that.

> Question: should something validate that the request length is not
> zero earlier, or should there be a check in ide-cd.c, or is it my
> hardware (its a generic cd read/rewriter which announces itself as
> 'CW078D CD-R/RW')

It's hard to know, you would have to parse every command type to verify
if the dxfer_len made sense or not. It's perfectly possible to generate
a command that would hang the drive as you describe above, only to be
aborted after it times out.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-20 19:47 ` Jens Axboe
@ 2004-11-21  0:53   ` Alan Chandler
  2004-11-21  8:56     ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-11-21  0:53 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On Saturday 20 November 2004 19:47, Jens Axboe wrote:
> On Sat, Nov 20 2004, Alan Chandler wrote:
...
> > Normally, because the requested data_len is not zero, the data is
> > sent.  In this case however, because the original request had nothing
> > to send, the while/if clauses to initiate a new transfer are skipped
> > and the routine ends up setting a new interrupt handler address and
> > returning to await an interrupt that will never come.
>
> The big question is - what does the original command look like? Just
> dumping rq->cmd[0] would be a big help, but really just put code in
> sg_io() in block/scsi_ioctl.c to dump the completed sg_io_hdr_t and send
> that.

I haven't dumped the whole request header, but the command (after it has been 
retrieved from the user) and the dxfer_length.  Is there anything else I 
should dump?

Here is the output leading up to the point where ide-cd hangs because the IO 
is just left pending

Nov 21 00:44:20 kanger kernel: sg_io command length 10
Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
Nov 21 00:44:20 kanger kernel: sg_io command length 10
Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
Nov 21 00:44:20 kanger kernel: sg_io command length 10
Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
Nov 21 00:44:20 kanger kernel: sg_io command length 10
Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
Nov 21 00:44:20 kanger kernel: sg_io command length 6
Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 0
Nov 21 00:44:20 kanger kernel: sg_io command length 6
Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x1b
Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x3
Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 0
Nov 21 00:45:00 kanger kernel: hdc: lost interrupt
Nov 21 00:45:40 kanger kernel: hdc: lost interrupt
Nov 21 00:47:00 kanger last message repeated 2 times
Nov 21 00:47:40 kanger kernel: hdc: lost interrupt



-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-21  0:53   ` Alan Chandler
@ 2004-11-21  8:56     ` Jens Axboe
  2004-11-21 10:25       ` Alan Chandler
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2004-11-21  8:56 UTC (permalink / raw)
  To: Alan Chandler; +Cc: linux-kernel

On Sun, Nov 21 2004, Alan Chandler wrote:
> On Saturday 20 November 2004 19:47, Jens Axboe wrote:
> > On Sat, Nov 20 2004, Alan Chandler wrote:
> ...
> > > Normally, because the requested data_len is not zero, the data is
> > > sent.  In this case however, because the original request had nothing
> > > to send, the while/if clauses to initiate a new transfer are skipped
> > > and the routine ends up setting a new interrupt handler address and
> > > returning to await an interrupt that will never come.
> >
> > The big question is - what does the original command look like? Just
> > dumping rq->cmd[0] would be a big help, but really just put code in
> > sg_io() in block/scsi_ioctl.c to dump the completed sg_io_hdr_t and send
> > that.
> 
> I haven't dumped the whole request header, but the command (after it has been 
> retrieved from the user) and the dxfer_length.  Is there anything else I 
> should dump?

No that's fine, that's all I need.

> Here is the output leading up to the point where ide-cd hangs because the IO 
> is just left pending
> 
> Nov 21 00:44:20 kanger kernel: sg_io command length 10
> Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
> Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
> Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
> Nov 21 00:44:20 kanger kernel: sg_io command length 10
> Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
> Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
> Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
> Nov 21 00:44:20 kanger kernel: sg_io command length 10
> Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
> Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
> Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
> Nov 21 00:44:20 kanger kernel: sg_io command length 10
> Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x3c
> Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [6] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [7] = 0xfc
> Nov 21 00:44:20 kanger kernel: sg_io command [8] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [9] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 64512
> Nov 21 00:44:20 kanger kernel: sg_io command length 6
> Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 0
> Nov 21 00:44:20 kanger kernel: sg_io command length 6
> Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x1b
> Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x3
> Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
> Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 0
> Nov 21 00:45:00 kanger kernel: hdc: lost interrupt
> Nov 21 00:45:40 kanger kernel: hdc: lost interrupt
> Nov 21 00:47:00 kanger last message repeated 2 times
> Nov 21 00:47:40 kanger kernel: hdc: lost interrupt

So the last request is a START_STOP unit, which doesn't transfer any
data. If the drive has DRQ_STAT stat set here, it looks very odd. Any
chance you could instrument cdrom_newpc_intr() as well to dump status
bytes and expected transfer lengths from the drive?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-21  8:56     ` Jens Axboe
@ 2004-11-21 10:25       ` Alan Chandler
  2004-11-21 16:13         ` Alan Chandler
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-11-21 10:25 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On Sunday 21 November 2004 08:56, Jens Axboe wrote:
> On Sun, Nov 21 2004, Alan Chandler wrote:
...
> > Nov 21 00:44:20 kanger kernel: sg_io command length 6
> > Nov 21 00:44:20 kanger kernel: sg_io command [0] = 0x1b
> > Nov 21 00:44:20 kanger kernel: sg_io command [1] = 0x0
> > Nov 21 00:44:20 kanger kernel: sg_io command [2] = 0x0
> > Nov 21 00:44:20 kanger kernel: sg_io command [3] = 0x0
> > Nov 21 00:44:20 kanger kernel: sg_io command [4] = 0x3
> > Nov 21 00:44:20 kanger kernel: sg_io command [5] = 0x0
> > Nov 21 00:44:20 kanger kernel: sg_io dxfer_len = 0
> > Nov 21 00:45:00 kanger kernel: hdc: lost interrupt
> > Nov 21 00:45:40 kanger kernel: hdc: lost interrupt
> > Nov 21 00:47:00 kanger last message repeated 2 times
> > Nov 21 00:47:40 kanger kernel: hdc: lost interrupt
>
> So the last request is a START_STOP unit, which doesn't transfer any
> data. If the drive has DRQ_STAT stat set here, it looks very odd. Any
> chance you could instrument cdrom_newpc_intr() as well to dump status
> bytes and expected transfer lengths from the drive?

As below - note I have also got a printk in cdrom_timer_expiry () - but 
nothing there.  I have included a couple of commands before the key one, and 
also a few 'lost interrupt' calls after the lock to show how it then doesn't 
recover. (and I made a spelling error in one of the messages the message - 
seld should mean self).

Nov 21 10:13:44 kanger kernel: scsi_cmd_ioctl - cmd = 0x2285
Nov 21 10:13:44 kanger kernel: scsi_ioctl: sg_io cmd [0] = 0x3c
Nov 21 10:13:44 kanger kernel: scsi_ioctl: sg_io dxfer_len = 64512
Nov 21 10:13:44 kanger kernel: scsi_ioctl: about to execute cmd 0x3c
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_do_block_pc
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_start_packet_command - xferlen = 
64512 - dma = 1
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (before decode status) 
- dma = 1 dma_error = 0x0
Nov 21 10:13:44 kanger kernel: scsi_ioctl: completed command with status 0x0
Nov 21 10:13:44 kanger kernel: scsi_cmd_ioctl - cmd = 0x2285
Nov 21 10:13:44 kanger kernel: scsi_ioctl: sg_io cmd [0] = 0x3c
Nov 21 10:13:44 kanger kernel: scsi_ioctl: sg_io dxfer_len = 64512
Nov 21 10:13:44 kanger kernel: scsi_ioctl: about to execute cmd 0x3c
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_do_block_pc
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_start_packet_command - xferlen = 
64512 - dma = 1
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (before decode status) 
- dma = 1 dma_error = 0x0
Nov 21 10:13:44 kanger kernel: scsi_ioctl: completed command with status 0x0
Nov 21 10:13:44 kanger kernel: scsi_cmd_ioctl - cmd = 0x2285
Nov 21 10:13:44 kanger kernel: scsi_ioctl: sg_io cmd [0] = 0x0
Nov 21 10:13:44 kanger kernel: scsi_ioctl: sg_io dxfer_len = 0
Nov 21 10:13:44 kanger kernel: scsi_ioctl: about to execute cmd 0x0
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_do_block_pc
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_start_packet_command - xferlen = 0 
- dma = 0
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (before decode status) 
- dma = 0 dma_error = 0x0
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (in pio after reading 
registers) ireason = 2 len = 0 stat = 0x58
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (DRQ not clear) - rq 
cmd[0] = 0x0 rq len = 0
Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (about to call 
ide_set_handler with seld as rentry) timeout = 40000
Nov 21 10:14:24 kanger kernel: hdc: lost interrupt
Nov 21 10:14:24 kanger kernel: ide-cd:cdrom_newpc_intr (before decode status) 
- dma = 0 dma_error = 0x0
Nov 21 10:14:24 kanger kernel: ide-cd:cdrom_newpc_intr (in pio after reading 
registers) ireason = 2 len = 0 stat = 0x58
Nov 21 10:14:24 kanger kernel: ide-cd:cdrom_newpc_intr (DRQ not clear) - rq 
cmd[0] = 0x0 rq len = 0
Nov 21 10:14:24 kanger kernel: ide-cd:cdrom_newpc_intr (about to call 
ide_set_handler with seld as rentry) timeout = 40000
Nov 21 10:15:04 kanger kernel: hdc: lost interrupt
Nov 21 10:15:04 kanger kernel: ide-cd:cdrom_newpc_intr (before decode status) 
- dma = 0 dma_error = 0x0
Nov 21 10:15:04 kanger kernel: ide-cd:cdrom_newpc_intr (in pio after reading 
registers) ireason = 2 len = 0 stat = 0x58
Nov 21 10:15:04 kanger kernel: ide-cd:cdrom_newpc_intr (DRQ not clear) - rq 
cmd[0] = 0x0 rq len = 0
Nov 21 10:15:04 kanger kernel: ide-cd:cdrom_newpc_intr (about to call 
ide_set_handler with seld as rentry) timeout = 40000



-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-21 10:25       ` Alan Chandler
@ 2004-11-21 16:13         ` Alan Chandler
  2004-11-22  7:52           ` Alan Chandler
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-11-21 16:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jens Axboe

On Sunday 21 November 2004 10:25, Alan Chandler wrote:
...
> 0x0 Nov 21 10:13:44 kanger kernel: scsi_cmd_ioctl - cmd = 0x2285
> Nov 21 10:13:44 kanger kernel: scsi_ioctl: sg_io cmd [0] = 0x0
...
> Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (in pio after
> reading registers) ireason = 2 len = 0 stat = 0x58
> Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (DRQ not clear) - rq
> cmd[0] = 0x0 rq len = 0
> Nov 21 10:13:44 kanger kernel: ide-cd:cdrom_newpc_intr (about to call
> ide_set_handler with seld as rentry) timeout = 40000
> Nov 21 10:14:24 kanger kernel: hdc: lost interrupt

This seems to be some combination of frequently occuring timing problem, and 
the difference treatment in cdrom_newpc_intr to cdrom_pc_intr

I instrumented the latter interrupt routine as well as the first, and got the 
following whilst loading up the system.  It seems that there are several 
times where DRQ is asserted initially, but after a timeout no longer is.

ide-cd:cdrom_do_packet_command
ide-cd:cdrom_start_packet_command - xferlen = 24 - dma = 0
ide-cd:cdrpm_pc_intr - from command 0x5a
ide-cd:cdrom_pc_intr - stat = 0x58 ireason = 2 len = 24
ide-cd:cdrpm_pc_intr - from command 0x5a
ide-cd:cdrom_pc_intr - stat = 0x50 ireason = 3 len = 24
hdc: ATAPI 48X CD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
ide-cd:cdrom_do_packet_command
ide-cd:cdrom_start_packet_command - xferlen = 0 - dma = 0
ide-cd:cdrpm_pc_intr - from command 0x0
ide-cd:cdrom_pc_intr - stat = 0x50 ireason = 3 len = 0
ide-cd:cdrom_do_packet_command
ide-cd:cdrom_start_packet_command - xferlen = 8 - dma = 0
ide-cd:cdrpm_pc_intr - from command 0x25
ide-cd:cdrom_pc_intr - stat = 0x58 ireason = 2 len = 8
ide-cd:cdrpm_pc_intr - from command 0x25
ide-cd:cdrom_pc_intr - stat = 0x50 ireason = 3 len = 8
ide-cd:cdrom_do_packet_command
ide-cd:cdrom_start_packet_command - xferlen = 4 - dma = 0
ide-cd:cdrpm_pc_intr - from command 0x43
ide-cd:cdrom_do_packet_command
ide-cd:cdrom_start_packet_command - xferlen = 18 - dma = 0
ide-cd:cdrpm_pc_intr - from command 0x3
ide-cd:cdrom_pc_intr - stat = 0x58 ireason = 2 len = 18
ide-cd:cdrpm_pc_intr - from command 0x3
ide-cd:cdrom_pc_intr - stat = 0x50 ireason = 3 len = 18

The other point is that the point of error is not entirely consistent between 
runs.  Occasssionally the 0x0 command succeeds and moves on to the 0x1b 
command

[It might also be worth pointing out here that I am using 2.6.9 and tried with 
2.6.10-rc2 and could not even get the system to get past the cd-rom 
initialisation stage.  I haven't been able to find out what might be 
different between the two ways of initialising the system - I can't see much 
difference inside ide-cd.c]


-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-21 16:13         ` Alan Chandler
@ 2004-11-22  7:52           ` Alan Chandler
  2004-11-22  8:01             ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-11-22  7:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jens Axboe

On Sunday 21 November 2004 16:13, Alan Chandler wrote:
...
>
> This seems to be some combination of frequently occuring timing problem,
> and the difference treatment in cdrom_newpc_intr to cdrom_pc_intr

I put a ndelay(400) at the head of cdrom_newpc_intr and the problem of DRQ 
being set when there was no data to transfer disappeared.  It appears that my 
hardware is too slow.

I have been reading the ATA/ATAPI - 6 spec, and it implies that the state of 
DRQ line need one pio cycle before being correct and that you should read the 
alternative status register to achieve this.  I tried a simple

HWIF(drive)->INB( IDE_ALTSTATUS_REG);

But that made no difference.



-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-22  7:52           ` Alan Chandler
@ 2004-11-22  8:01             ` Jens Axboe
  2004-11-22 10:30               ` Alan Chandler
  2004-11-23 18:34               ` Jeff Garzik
  0 siblings, 2 replies; 37+ messages in thread
From: Jens Axboe @ 2004-11-22  8:01 UTC (permalink / raw)
  To: Alan Chandler; +Cc: linux-kernel

On Mon, Nov 22 2004, Alan Chandler wrote:
> On Sunday 21 November 2004 16:13, Alan Chandler wrote:
> ...
> >
> > This seems to be some combination of frequently occuring timing problem,
> > and the difference treatment in cdrom_newpc_intr to cdrom_pc_intr
> 
> I put a ndelay(400) at the head of cdrom_newpc_intr and the problem of
> DRQ being set when there was no data to transfer disappeared.  It
> appears that my hardware is too slow.
> 
> I have been reading the ATA/ATAPI - 6 spec, and it implies that the
> state of DRQ line need one pio cycle before being correct and that you
> should read the alternative status register to achieve this.  I tried
> a simple
> 
> HWIF(drive)->INB( IDE_ALTSTATUS_REG);
> 
> But that made no difference.

ALTSTATUS read should be fine as well, but the implicit delay is
probably better.

Is this enough to fix it? For ->drq_interrupt we already should have
an adequate delay, Alan fixed this one recently.

===== drivers/ide/ide-cd.c 1.97 vs edited =====
--- 1.97/drivers/ide/ide-cd.c	2004-11-07 02:54:41 +01:00
+++ edited/drivers/ide/ide-cd.c	2004-11-22 08:58:15 +01:00
@@ -890,8 +890,13 @@
 		ide_execute_command(drive, WIN_PACKETCMD, handler, ATAPI_WAIT_PC, cdrom_timer_expiry);
 		return ide_started;
 	} else {
+		unsigned long flags;
+
 		/* packet command */
-		HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
+		spin_lock_irqsave(&ide_lock, flags);
+		HWIF(drive)->OUTBSYNC(drive, WIN_PACKETCMD, IDE_COMMAND_REG);
+		ndelay(400);
+		spin_unlock_irqrestore(&ide_lock, flags);
 		return (*handler) (drive);
 	}
 }

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-22  8:01             ` Jens Axboe
@ 2004-11-22 10:30               ` Alan Chandler
  2004-11-22 10:51                 ` Jens Axboe
  2004-11-23 18:34               ` Jeff Garzik
  1 sibling, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-11-22 10:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

Jens Axboe writes: 

> On Mon, Nov 22 2004, Alan Chandler wrote:
>> On Sunday 21 November 2004 16:13, Alan Chandler wrote:
>> ...
>> >
>> > This seems to be some combination of frequently occuring timing problem,
>> > and the difference treatment in cdrom_newpc_intr to cdrom_pc_intr 
>> 
>> I put a ndelay(400) at the head of cdrom_newpc_intr and the problem of
>> DRQ being set when there was no data to transfer disappeared.  It
>> appears that my hardware is too slow. 
>> 
>> I have been reading the ATA/ATAPI - 6 spec, and it implies that the
>> state of DRQ line need one pio cycle before being correct and that you
>> should read the alternative status register to achieve this.  I tried
>> a simple 
>> 
>> HWIF(drive)->INB( IDE_ALTSTATUS_REG); 
>> 
>> But that made no difference.
> 
> ALTSTATUS read should be fine as well, but the implicit delay is
> probably better. 
> 

I don't know why, but the ALTSTATUS read did NOT work when I tried it 
yesterday (am currently at work using web mail to access my mail - can't do 
more until this evening).  Its possible I put it in the wrong place (ie 
after the cdrom_decode_status call, but I don't think so. 

The ndelay(400) did work. 

> Is this enough to fix it? For ->drq_interrupt we already should have
> an adequate delay, Alan fixed this one recently. 
> 

Yes, I had included this patch quite early in my process of tracking the 
problem down (when I corrected it like you have (add the drive parameter to 
the OUTBSYNC macro like you have, you also need to declare an unsigned long 
flags at the head of the routine that was also not in that patch).  Indeed 
it was this that was the inspiration for the 400 nanosecs in my change.  I 
have no idea what the right number should be 


Alan Chandler
alan@chandlerfamily.org.uk 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-22 10:30               ` Alan Chandler
@ 2004-11-22 10:51                 ` Jens Axboe
  2004-11-22 11:29                   ` Alan Chandler
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2004-11-22 10:51 UTC (permalink / raw)
  To: Alan Chandler; +Cc: linux-kernel

On Mon, Nov 22 2004, Alan Chandler wrote:
> Jens Axboe writes: 
> 
> >On Mon, Nov 22 2004, Alan Chandler wrote:
> >>On Sunday 21 November 2004 16:13, Alan Chandler wrote:
> >>...
> >>>
> >>> This seems to be some combination of frequently occuring timing problem,
> >>> and the difference treatment in cdrom_newpc_intr to cdrom_pc_intr 
> >>
> >>I put a ndelay(400) at the head of cdrom_newpc_intr and the problem of
> >>DRQ being set when there was no data to transfer disappeared.  It
> >>appears that my hardware is too slow. 
> >>
> >>I have been reading the ATA/ATAPI - 6 spec, and it implies that the
> >>state of DRQ line need one pio cycle before being correct and that you
> >>should read the alternative status register to achieve this.  I tried
> >>a simple 
> >>
> >>HWIF(drive)->INB( IDE_ALTSTATUS_REG); 
> >>
> >>But that made no difference.
> >
> >ALTSTATUS read should be fine as well, but the implicit delay is
> >probably better. 
> >
> 
> I don't know why, but the ALTSTATUS read did NOT work when I tried it 
> yesterday (am currently at work using web mail to access my mail - can't do 
> more until this evening).  Its possible I put it in the wrong place (ie 
> after the cdrom_decode_status call, but I don't think so. 
> 
> The ndelay(400) did work. 
> 
> >Is this enough to fix it? For ->drq_interrupt we already should have
> >an adequate delay, Alan fixed this one recently. 
> >
> 
> Yes, I had included this patch quite early in my process of tracking
> the problem down (when I corrected it like you have (add the drive
> parameter to the OUTBSYNC macro like you have, you also need to
> declare an unsigned long flags at the head of the routine that was
> also not in that patch).  Indeed it was this that was the inspiration
> for the 400 nanosecs in my change.  I have no idea what the right
> number should be 

400ns is the correctl value. Your writing is a little unclear to me -
did it work or not, with that change alone?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-22 10:51                 ` Jens Axboe
@ 2004-11-22 11:29                   ` Alan Chandler
  2004-11-22 11:31                     ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-11-22 11:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

Jens Axboe writes: 

> On Mon, Nov 22 2004, Alan Chandler wrote:
>> Jens Axboe writes:  
>> 
>> >On Mon, Nov 22 2004, Alan Chandler wrote:
>> >>On Sunday 21 November 2004 16:13, Alan Chandler wrote:
>> >>...
>> >>>
>> >>> This seems to be some combination of frequently occuring timing problem,
>> >>> and the difference treatment in cdrom_newpc_intr to cdrom_pc_intr 
>> >>
>> >>I put a ndelay(400) at the head of cdrom_newpc_intr and the problem of
>> >>DRQ being set when there was no data to transfer disappeared.  It
>> >>appears that my hardware is too slow. 
>> >>
>> >>I have been reading the ATA/ATAPI - 6 spec, and it implies that the
>> >>state of DRQ line need one pio cycle before being correct and that you
>> >>should read the alternative status register to achieve this.  I tried
>> >>a simple 
>> >>
>> >>HWIF(drive)->INB( IDE_ALTSTATUS_REG); 
>> >>
>> >>But that made no difference.
>> >
>> >ALTSTATUS read should be fine as well, but the implicit delay is
>> >probably better. 
>> > 
>> 
>> I don't know why, but the ALTSTATUS read did NOT work when I tried it 
>> yesterday (am currently at work using web mail to access my mail - can't do 
>> more until this evening).  Its possible I put it in the wrong place (ie 
>> after the cdrom_decode_status call, but I don't think so.  
>> 
>> The ndelay(400) did work.  
>> 
>> >Is this enough to fix it? For ->drq_interrupt we already should have
>> >an adequate delay, Alan fixed this one recently. 
>> > 
>> 
>> Yes, I had included this patch quite early in my process of tracking
>> the problem down (when I corrected it like you have (add the drive
>> parameter to the OUTBSYNC macro like you have, you also need to
>> declare an unsigned long flags at the head of the routine that was
>> also not in that patch).  Indeed it was this that was the inspiration
>> for the 400 nanosecs in my change.  I have no idea what the right
>> number should be 
> 
> 400ns is the correctl value. Your writing is a little unclear to me -
> did it work or not, with that change alone? 
> 

To be clear ... 


I have modified ide-cd.c with 

1) ndelay(400) at the head of cdrom_newpc_intr() 

2) Alan Cox's patch in the place he originally identified for it to go 

3) Some printk's in cdrom_newpc_intr() after the point where it reads the 
status and IREASON and length registers and just for the purposes of 
diagnostics. 

With only those changes it now works. 

 

Alan Chandler
alan@chandlerfamily.org.uk 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-22 11:29                   ` Alan Chandler
@ 2004-11-22 11:31                     ` Jens Axboe
  2004-11-22 12:53                       ` Alan Chandler
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2004-11-22 11:31 UTC (permalink / raw)
  To: Alan Chandler; +Cc: linux-kernel

On Mon, Nov 22 2004, Alan Chandler wrote:
> Jens Axboe writes: 
> 
> >On Mon, Nov 22 2004, Alan Chandler wrote:
> >>Jens Axboe writes:  
> >>
> >>>On Mon, Nov 22 2004, Alan Chandler wrote:
> >>>>On Sunday 21 November 2004 16:13, Alan Chandler wrote:
> >>>>...
> >>>>>
> >>>>> This seems to be some combination of frequently occuring timing 
> >>problem,
> >>>>> and the difference treatment in cdrom_newpc_intr to cdrom_pc_intr 
> >>>>
> >>>>I put a ndelay(400) at the head of cdrom_newpc_intr and the problem of
> >>>>DRQ being set when there was no data to transfer disappeared.  It
> >>>>appears that my hardware is too slow. 
> >>>>
> >>>>I have been reading the ATA/ATAPI - 6 spec, and it implies that the
> >>>>state of DRQ line need one pio cycle before being correct and that you
> >>>>should read the alternative status register to achieve this.  I tried
> >>>>a simple 
> >>>>
> >>>>HWIF(drive)->INB( IDE_ALTSTATUS_REG); 
> >>>>
> >>>>But that made no difference.
> >>>
> >>>ALTSTATUS read should be fine as well, but the implicit delay is
> >>>probably better. 
> >>> 
> >>
> >>I don't know why, but the ALTSTATUS read did NOT work when I tried it 
> >>yesterday (am currently at work using web mail to access my mail - can't 
> >>do more until this evening).  Its possible I put it in the wrong place 
> >>(ie after the cdrom_decode_status call, but I don't think so.  
> >>
> >>The ndelay(400) did work.  
> >>
> >>>Is this enough to fix it? For ->drq_interrupt we already should have
> >>>an adequate delay, Alan fixed this one recently. 
> >>> 
> >>
> >>Yes, I had included this patch quite early in my process of tracking
> >>the problem down (when I corrected it like you have (add the drive
> >>parameter to the OUTBSYNC macro like you have, you also need to
> >>declare an unsigned long flags at the head of the routine that was
> >>also not in that patch).  Indeed it was this that was the inspiration
> >>for the 400 nanosecs in my change.  I have no idea what the right
> >>number should be 
> >
> >400ns is the correctl value. Your writing is a little unclear to me -
> >did it work or not, with that change alone? 
> >
> 
> To be clear ... 
> 
> 
> I have modified ide-cd.c with 
> 
> 1) ndelay(400) at the head of cdrom_newpc_intr() 
> 
> 2) Alan Cox's patch in the place he originally identified for it to go 
> 
> 3) Some printk's in cdrom_newpc_intr() after the point where it reads the 
> status and IREASON and length registers and just for the purposes of 
> diagnostics. 
> 
> With only those changes it now works. 

You are not answering my question :-)

Here's is Alans patch as I posted some mails ago. Does it work with that
alone?? I'm curious of it is enough. It should not be necessary to incur
extra delay in the interrupt handler, if it is invoked from a real irq.

===== drivers/ide/ide-cd.c 1.97 vs edited =====
--- 1.97/drivers/ide/ide-cd.c	2004-11-07 02:54:41 +01:00
+++ edited/drivers/ide/ide-cd.c	2004-11-22 08:58:15 +01:00
@@ -890,8 +890,13 @@
 		ide_execute_command(drive, WIN_PACKETCMD, handler, ATAPI_WAIT_PC, cdrom_timer_expiry);
 		return ide_started;
 	} else {
+		unsigned long flags;
+
 		/* packet command */
-		HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
+		spin_lock_irqsave(&ide_lock, flags);
+		HWIF(drive)->OUTBSYNC(drive, WIN_PACKETCMD, IDE_COMMAND_REG);
+		ndelay(400);
+		spin_unlock_irqrestore(&ide_lock, flags);
 		return (*handler) (drive);
 	}
 }

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-22 11:31                     ` Jens Axboe
@ 2004-11-22 12:53                       ` Alan Chandler
  2004-11-22 13:02                         ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-11-22 12:53 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Alan Chandler, linux-kernel

Jens Axboe writes: 

> On Mon, Nov 22 2004, Alan Chandler wrote:
>> Jens Axboe writes: 

>> >400ns is the correctl value. Your writing is a little unclear to me -
>> >did it work or not, with that change alone? 
>> > 
>> 
>> To be clear ...  
>> 
>> 
>> I have modified ide-cd.c with  
>> 
>> 1) ndelay(400) at the head of cdrom_newpc_intr()  
>> 
>> 2) Alan Cox's patch in the place he originally identified for it to go  
>> 
>> 3) Some printk's in cdrom_newpc_intr() after the point where it reads the 
>> status and IREASON and length registers and just for the purposes of 
>> diagnostics.  
>> 
>> With only those changes it now works. 
> 
> You are not answering my question :-) 
> 
> Here's is Alans patch as I posted some mails ago. Does it work with that
> alone?? I'm curious of it is enough. It should not be necessary to incur
> extra delay in the interrupt handler, if it is invoked from a real irq.

Sorry, I misunderstood what you meant.  I presume you think that the 
interrupt may be triggered immediately the command packet has been sent but 
before 400ns delay had occurred. 

NO - with Alan's patch alone, this did not work. 

The delay seesm to be needed in the path between the interrupt occuring and 
the IDE_STATUS_REG being read. 

I had seen an note on a web site that said that there was two delays 
required in the ATA/ATAPI spec - the 400ns which Alan's patch deals with and 
a shorter delay (one PIO cycle) between busy being cleared and DRQ reaching 
the correct state where the technique had been to read the ALTSTATUS 
register.  That was why I had tried that approach but found it not to work. 

(I have subsequently downloaded a copy of the full spec and haven't been 
able to find this - but then its just short of 500 pages of dense text:-)). 

Thinking about it now, I tried the ALTSTATUS delay before applying Alan's 
patch, so maybe its the some of the two delays that maybe necessary.  If you 
think its appropriate I will try that again this evening. 

 --
Alan Chandler
alan@chandlerfamily.org.uk 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-22 12:53                       ` Alan Chandler
@ 2004-11-22 13:02                         ` Jens Axboe
  2004-11-22 19:19                           ` Alan Chandler
  2004-11-24 23:19                           ` ide-cd problem Alan Cox
  0 siblings, 2 replies; 37+ messages in thread
From: Jens Axboe @ 2004-11-22 13:02 UTC (permalink / raw)
  To: Alan Chandler; +Cc: linux-kernel

On Mon, Nov 22 2004, Alan Chandler wrote:
> Jens Axboe writes: 
> 
> >On Mon, Nov 22 2004, Alan Chandler wrote:
> >>Jens Axboe writes: 
> 
> >>>400ns is the correctl value. Your writing is a little unclear to me -
> >>>did it work or not, with that change alone? 
> >>> 
> >>
> >>To be clear ...  
> >>
> >>
> >>I have modified ide-cd.c with  
> >>
> >>1) ndelay(400) at the head of cdrom_newpc_intr()  
> >>
> >>2) Alan Cox's patch in the place he originally identified for it to go  
> >>
> >>3) Some printk's in cdrom_newpc_intr() after the point where it reads the 
> >>status and IREASON and length registers and just for the purposes of 
> >>diagnostics.  
> >>
> >>With only those changes it now works. 
> >
> >You are not answering my question :-) 
> >
> >Here's is Alans patch as I posted some mails ago. Does it work with that
> >alone?? I'm curious of it is enough. It should not be necessary to incur
> >extra delay in the interrupt handler, if it is invoked from a real irq.
> 
> Sorry, I misunderstood what you meant.  I presume you think that the 
> interrupt may be triggered immediately the command packet has been sent but 
> before 400ns delay had occurred. 
> 
> NO - with Alan's patch alone, this did not work. 
> 
> The delay seesm to be needed in the path between the interrupt occuring and 
> the IDE_STATUS_REG being read. 
> 
> I had seen an note on a web site that said that there was two delays 
> required in the ATA/ATAPI spec - the 400ns which Alan's patch deals with 
> and a shorter delay (one PIO cycle) between busy being cleared and DRQ 
> reaching the correct state where the technique had been to read the 
> ALTSTATUS register.  That was why I had tried that approach but found it 
> not to work. 
> (I have subsequently downloaded a copy of the full spec and haven't been 
> able to find this - but then its just short of 500 pages of dense text:-)). 
> 
> Thinking about it now, I tried the ALTSTATUS delay before applying Alan's 
> patch, so maybe its the some of the two delays that maybe necessary.  If 
> you think its appropriate I will try that again this evening. 

I think the more correct patch is the following. It seems I was wrong in
assuming that the ide_intr() path already waited 400ns for us, I think
this should work for you. Can you test it?

===== drivers/ide/ide-iops.c 1.31 vs edited =====
--- 1.31/drivers/ide/ide-iops.c	2004-11-01 18:06:50 +01:00
+++ edited/drivers/ide/ide-iops.c	2004-11-22 13:59:27 +01:00
@@ -476,10 +476,8 @@
 	if (drive->waiting_for_dma)
 		return hwif->ide_dma_test_irq(drive);
 
-#if 0
 	/* need to guarantee 400ns since last command was issued */
-	udelay(1);
-#endif
+	ndelay(400);
 
 #ifdef CONFIG_IDEPCI_SHARE_IRQ
 	/*

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-22 13:02                         ` Jens Axboe
@ 2004-11-22 19:19                           ` Alan Chandler
  2004-11-22 23:48                             ` Alan Chandler
  2004-11-24 23:19                           ` ide-cd problem Alan Cox
  1 sibling, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-11-22 19:19 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On Monday 22 November 2004 13:02, Jens Axboe wrote:

>
> I think the more correct patch is the following. It seems I was wrong in
> assuming that the ide_intr() path already waited 400ns for us, I think
> this should work for you. Can you test it?

Bad news - it didn't work.

It certainly looks as though it should - I am trying to find out what not.


>
> ===== drivers/ide/ide-iops.c 1.31 vs edited =====
> --- 1.31/drivers/ide/ide-iops.c 2004-11-01 18:06:50 +01:00
> +++ edited/drivers/ide/ide-iops.c 2004-11-22 13:59:27 +01:00
> @@ -476,10 +476,8 @@
>   if (drive->waiting_for_dma)
>    return hwif->ide_dma_test_irq(drive);
>
> -#if 0
>   /* need to guarantee 400ns since last command was issued */
> - udelay(1);
> -#endif
> + ndelay(400);
>
>  #ifdef CONFIG_IDEPCI_SHARE_IRQ
>   /*

-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-22 19:19                           ` Alan Chandler
@ 2004-11-22 23:48                             ` Alan Chandler
  2004-11-23  7:13                               ` Alan Chandler
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-11-22 23:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jens Axboe

On Monday 22 November 2004 19:19, Alan Chandler wrote:
> On Monday 22 November 2004 13:02, Jens Axboe wrote:
> > I think the more correct patch is the following. It seems I was wrong in
> > assuming that the ide_intr() path already waited 400ns for us, I think
> > this should work for you. Can you test it?
>
> Bad news - it didn't work.
>
> It certainly looks as though it should - I am trying to find out what not.
>

I meant of course "I am trying to find out why not".

If I make the delay 600ns it works - I guess my hardware is a little off spec.

If I leave the original udelay(1) in there, I get irq timeouts all over the 
place.



-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-22 23:48                             ` Alan Chandler
@ 2004-11-23  7:13                               ` Alan Chandler
  2004-11-23 14:51                                 ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-11-23  7:13 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On Monday 22 November 2004 23:48, Alan Chandler wrote:
...
> If I make the delay 600ns it works - I guess my hardware is a little off
> spec.
>

I did a binary chop on the value to find the cut off point between what works 
and what doesn't.  Its approx 535ns (534 failed, 537 worked).

All this was with 2.6.9, 

2.6.10-rc2 is still failing during the cd initialisation on boot.  Here I 
tried with bot 600ns and 700ns delays in drive_is_ready, but both values fail 
with what looks like missed interrupts.  I'll try instrumenting this a bit 
more to find out what is happening.

-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-23  7:13                               ` Alan Chandler
@ 2004-11-23 14:51                                 ` Jens Axboe
  2004-11-23 21:49                                   ` Alan Chandler
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2004-11-23 14:51 UTC (permalink / raw)
  To: Alan Chandler; +Cc: linux-kernel

On Tue, Nov 23 2004, Alan Chandler wrote:
> On Monday 22 November 2004 23:48, Alan Chandler wrote:
> ...
> > If I make the delay 600ns it works - I guess my hardware is a little off
> > spec.
> >
> 
> I did a binary chop on the value to find the cut off point between what works 
> and what doesn't.  Its approx 535ns (534 failed, 537 worked).
> 
> All this was with 2.6.9, 
> 
> 2.6.10-rc2 is still failing during the cd initialisation on boot.  Here I 
> tried with bot 600ns and 700ns delays in drive_is_ready, but both values fail 
> with what looks like missed interrupts.  I'll try instrumenting this a bit 
> more to find out what is happening.

It's getting more and more interesting! Look forward to hearing what
your instrumentation brings.

There are other reports of acpi causing interrupt problems with cdroms
in 2.6.10-rc2, so it would be best if you stuck to 2.6.9 for testing
this particular problem.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-22  8:01             ` Jens Axboe
  2004-11-22 10:30               ` Alan Chandler
@ 2004-11-23 18:34               ` Jeff Garzik
  2004-11-23 19:13                 ` Jens Axboe
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff Garzik @ 2004-11-23 18:34 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Alan Chandler, linux-kernel

Jens Axboe wrote:
>  		/* packet command */
> -		HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
> +		spin_lock_irqsave(&ide_lock, flags);
> +		HWIF(drive)->OUTBSYNC(drive, WIN_PACKETCMD, IDE_COMMAND_REG);
> +		ndelay(400);
> +		spin_unlock_irqrestore(&ide_lock, flags);
>  		return (*handler) (drive);


FWIW this ndelay(400) is required by the spec, when you submit a command.

	Jeff



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-23 18:34               ` Jeff Garzik
@ 2004-11-23 19:13                 ` Jens Axboe
  0 siblings, 0 replies; 37+ messages in thread
From: Jens Axboe @ 2004-11-23 19:13 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Alan Chandler, linux-kernel

On Tue, Nov 23 2004, Jeff Garzik wrote:
> Jens Axboe wrote:
> > 		/* packet command */
> >-		HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
> >+		spin_lock_irqsave(&ide_lock, flags);
> >+		HWIF(drive)->OUTBSYNC(drive, WIN_PACKETCMD, IDE_COMMAND_REG);
> >+		ndelay(400);
> >+		spin_unlock_irqrestore(&ide_lock, flags);
> > 		return (*handler) (drive);
> 
> 
> FWIW this ndelay(400) is required by the spec, when you submit a command.

Yes, I know it isn't pulled out of thin air :)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-23 14:51                                 ` Jens Axboe
@ 2004-11-23 21:49                                   ` Alan Chandler
  2004-11-26 23:39                                     ` Alan Chandler
  2004-11-30  8:59                                     ` Alan Chandler
  0 siblings, 2 replies; 37+ messages in thread
From: Alan Chandler @ 2004-11-23 21:49 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On Tuesday 23 November 2004 14:51, Jens Axboe wrote:
> On Tue, Nov 23 2004, Alan Chandler wrote:
> > On Monday 22 November 2004 23:48, Alan Chandler wrote:
> > ...
> >
> > > If I make the delay 600ns it works - I guess my hardware is a little
> > > off spec.
> >
> > I did a binary chop on the value to find the cut off point between what
> > works and what doesn't.  Its approx 535ns (534 failed, 537 worked).
> >
> > All this was with 2.6.9,
> >
> > 2.6.10-rc2 is still failing during the cd initialisation on boot.  Here I
> > tried with bot 600ns and 700ns delays in drive_is_ready, but both values
> > fail with what looks like missed interrupts.  I'll try instrumenting this
> > a bit more to find out what is happening.
>
> It's getting more and more interesting! Look forward to hearing what
> your instrumentation brings.
>
> There are other reports of acpi causing interrupt problems with cdroms
> in 2.6.10-rc2, so it would be best if you stuck to 2.6.9 for testing
> this particular problem.

There is good and bad news related to 2.6.10-rc2

The good news is that the acpi problem was the cause of the startup issues.  
adding pci=noacpi to the boot command line fixed that.

The bad news is that with the delay at 800ns in drive_is_ready() I am getting 
the exact same symptoms I got with 2.6.9 before upping the delay to over 
540ns.

Before, I thought my hardware was a little out of spec - now I think there is 
something else at play here.

Firstly, the symptoms are the same between 2.6.9 and 2.6.10-rc2.  The halt 
seem to always be in exactly the same place. If it was a timing problem I 
would have thought it would have varied.

Secondly, the command before seems to have an expectation of a 2048 transfer 
rather than the 0 in the command, before the problem and then you get the 
strange DRQ=1 but 0 in the len register.

Nov 23 20:37:33 kanger kernel: ide-cd:ide_do_rq_cdrom - cmd = 0x0
Nov 23 20:37:33 kanger kernel: ide-cd:cdrom_newpc_intr - cmd=0x0 stat=0x50 
ireason=3 len=2048 rq len=0
Nov 23 20:37:33 kanger kernel: ide-cd:ide_do_rq_cdrom - cmd = 0x1b
Nov 23 20:37:33 kanger kernel: ide-cd:cdrom_newpc_intr - cmd=0x1b stat=0x58 
ireason=2 len=0 rq len=0

I have got myself a copy of the ATA/ATAPI spec (document T13/1410D revision 
3), I think I need to read more of it to understand what is the code is 
trying to do.



-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-22 13:02                         ` Jens Axboe
  2004-11-22 19:19                           ` Alan Chandler
@ 2004-11-24 23:19                           ` Alan Cox
  2004-11-25 15:29                             ` Jens Axboe
  1 sibling, 1 reply; 37+ messages in thread
From: Alan Cox @ 2004-11-24 23:19 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Alan Chandler, Linux Kernel Mailing List

> the following. It seems I was wrong in
> assuming that the ide_intr() path already waited 400ns for us, I think
> this should work for you. Can you test it?

The locking on ide_execute_command and friends is supposed to ensure
this, can we please keep it out of the IRQ handler once its debugged ?


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-24 23:19                           ` ide-cd problem Alan Cox
@ 2004-11-25 15:29                             ` Jens Axboe
  2004-11-25 16:25                               ` Alan Cox
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2004-11-25 15:29 UTC (permalink / raw)
  To: Alan Cox; +Cc: Alan Chandler, Linux Kernel Mailing List

On Wed, Nov 24 2004, Alan Cox wrote:
> > the following. It seems I was wrong in
> > assuming that the ide_intr() path already waited 400ns for us, I think
> > this should work for you. Can you test it?
> 
> The locking on ide_execute_command and friends is supposed to ensure
> this, can we please keep it out of the IRQ handler once its debugged ?

Something is funky with this drive, it requires an extra delay after it
has raised an interrupt. But we can restrict it to ide-cd once it's
fully understood.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-25 15:29                             ` Jens Axboe
@ 2004-11-25 16:25                               ` Alan Cox
  2004-11-25 18:12                                 ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Cox @ 2004-11-25 16:25 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Alan Chandler, Linux Kernel Mailing List

On Iau, 2004-11-25 at 15:29, Jens Axboe wrote:
> Something is funky with this drive, it requires an extra delay after it
> has raised an interrupt. But we can restrict it to ide-cd once it's
> fully understood.

You are assuming its a drive issue. Some controllers have errata around
this area that might be involved too. We can certainly stick a quirk for
IRQ delay into the drive tho


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-25 16:25                               ` Alan Cox
@ 2004-11-25 18:12                                 ` Jens Axboe
  2004-11-25 18:45                                   ` Alan Chandler
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2004-11-25 18:12 UTC (permalink / raw)
  To: Alan Cox; +Cc: Alan Chandler, Linux Kernel Mailing List

On Thu, Nov 25 2004, Alan Cox wrote:
> On Iau, 2004-11-25 at 15:29, Jens Axboe wrote:
> > Something is funky with this drive, it requires an extra delay after it
> > has raised an interrupt. But we can restrict it to ide-cd once it's
> > fully understood.
> 
> You are assuming its a drive issue. Some controllers have errata around
> this area that might be involved too. We can certainly stick a quirk for
> IRQ delay into the drive tho

That's true - Alan (Chandler), can you provide an lspci of the system as
well?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-25 18:12                                 ` Jens Axboe
@ 2004-11-25 18:45                                   ` Alan Chandler
  0 siblings, 0 replies; 37+ messages in thread
From: Alan Chandler @ 2004-11-25 18:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Alan Cox, Linux Kernel Mailing List

On Thursday 25 November 2004 18:12, Jens Axboe wrote:
> On Thu, Nov 25 2004, Alan Cox wrote:
> > On Iau, 2004-11-25 at 15:29, Jens Axboe wrote:
> > > Something is funky with this drive, it requires an extra delay after it
> > > has raised an interrupt. But we can restrict it to ide-cd once it's
> > > fully understood.
> >
> > You are assuming its a drive issue. Some controllers have errata around
> > this area that might be involved too. We can certainly stick a quirk for
> > IRQ delay into the drive tho
>
> That's true - Alan (Chandler), can you provide an lspci of the system as
> well?

lspci -vvx output

0000:00:00.0 Host bridge: Silicon Integrated Systems [SiS] 735 Host (rev 01)
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR- FastB2B-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort+ >SERR- <PERR-
 Latency: 32
 Region 0: Memory at d0000000 (32-bit, non-prefetchable) [size=64M]
 Capabilities: <available only to root>
00: 39 10 35 07 07 00 10 22 01 00 00 06 00 20 80 00
10: 00 00 00 d0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 c0 00 00 00 00 00 00 00 00 00 00 00

0000:00:01.0 PCI bridge: Silicon Integrated Systems [SiS] Virtual PCI-to-PCI 
bridge (AGP) (prog-if 00 [Normal decode])
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR+ FastB2B-
 Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
 Latency: 64
 Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
 I/O behind bridge: 0000b000-0000bfff
 Memory behind bridge: cfe00000-cfefffff
 Prefetchable memory behind bridge: afc00000-cfcfffff
 BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
00: 39 10 01 00 07 01 00 00 00 00 04 06 00 40 01 00
10: 00 00 00 00 00 00 00 00 00 01 01 40 b0 b0 00 20
20: e0 cf e0 cf c0 af c0 cf 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0a 00

0000:00:02.0 ISA bridge: Silicon Integrated Systems [SiS] SiS85C503/5513 (LPC 
Bridge)
 Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- 
SERR- FastB2B-
 Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
 Latency: 0
00: 39 10 18 00 0f 00 00 02 00 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000:00:02.1 SMBus: Silicon Integrated Systems [SiS]: Unknown device 0016
 Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR- FastB2B-
 Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
 Interrupt: pin B routed to IRQ 5
 Region 4: I/O ports at 0c00 [size=32]
00: 39 10 16 00 01 00 00 02 00 00 05 0c 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 0c 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00

0000:00:02.2 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 
Controller (rev 07) (prog-if 10 [OHCI])
 Subsystem: Elitegroup Computer Systems: Unknown device 0a14
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- 
SERR+ FastB2B-
 Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
 Latency: 64 (20000ns max), Cache Line Size: 0x08 (32 bytes)
 Interrupt: pin D routed to IRQ 11
 Region 0: Memory at cfffe000 (32-bit, non-prefetchable) [size=4K]
00: 39 10 01 70 17 01 80 02 07 10 03 0c 08 40 00 00
10: 00 e0 ff cf 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 19 10 14 0a
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 04 00 50

0000:00:02.3 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 
Controller (rev 07) (prog-if 10 [OHCI])
 Subsystem: Elitegroup Computer Systems: Unknown device 0a14
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- 
SERR+ FastB2B-
 Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR+
 Latency: 64 (20000ns max), Cache Line Size: 0x08 (32 bytes)
 Interrupt: pin A routed to IRQ 10
 Region 0: Memory at cffff000 (32-bit, non-prefetchable) [size=4K]
00: 39 10 01 70 17 01 80 82 07 10 03 0c 08 40 00 00
10: 00 f0 ff cf 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 19 10 14 0a
30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 50

0000:00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE] (rev 
d0) (prog-if 80 [Master])
 Subsystem: Silicon Integrated Systems [SiS] SiS5513 EIDE Controller (A,B 
step)
 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR- FastB2B-
 Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
 Latency: 128
 Region 4: I/O ports at ff00 [size=16]
00: 39 10 13 55 05 00 00 00 d0 80 01 01 00 80 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 ff 00 00 00 00 00 00 00 00 00 00 39 10 13 55
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000:00:0b.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 
07)
 Subsystem: Creative Labs SBLive! Player 5.1
 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR+ FastB2B-
 Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
 Latency: 64 (500ns min, 5000ns max)
 Interrupt: pin A routed to IRQ 5
 Region 0: I/O ports at d800 [size=32]
 Capabilities: <available only to root>
00: 02 11 02 00 05 01 90 02 07 00 01 04 00 40 80 00
10: 01 d8 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 02 11 61 80
30: 00 00 00 00 dc 00 00 00 00 00 00 00 05 01 02 14

0000:00:0b.1 Input device controller: Creative Labs SB Live! MIDI/Game Port 
(rev 07)
 Subsystem: Creative Labs Gameport Joystick
 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR+ FastB2B-
 Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
 Latency: 64
 Region 0: I/O ports at dc00 [size=8]
 Capabilities: <available only to root>
00: 02 11 02 70 05 01 90 02 07 00 80 09 00 40 80 00
10: 01 dc 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 02 11 20 00
30: 00 00 00 00 dc 00 00 00 00 00 00 00 00 00 00 00

0000:00:0f.0 Ethernet controller: 3Com Corporation 3cSOHO100-TX Hurricane (rev 
30)
 Subsystem: 3Com Corporation 3cSOHO100-TX Hurricane
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- 
SERR+ FastB2B-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
 Latency: 64 (2500ns min, 2500ns max), Cache Line Size: 0x08 (32 bytes)
 Interrupt: pin A routed to IRQ 11
 Region 0: I/O ports at d400 [size=128]
 Region 1: Memory at cfffdf80 (32-bit, non-prefetchable) [size=128]
 Expansion ROM at cffc0000 [disabled] [size=128K]
 Capabilities: <available only to root>
00: b7 10 46 76 17 01 10 02 30 00 00 02 08 40 00 00
10: 01 d4 00 00 80 df ff cf 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 b7 10 46 76
30: 00 00 fc cf dc 00 00 00 00 00 00 00 0b 01 0a 0a

0000:01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 
9200] (rev 01) (prog-if 00 [VGA])
 Subsystem: Giga-byte Technology: Unknown device 4018
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR+ FastB2B-
 Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
 Latency: 64 (2000ns min), Cache Line Size: 0x08 (32 bytes)
 Interrupt: pin A routed to IRQ 11
 Region 0: Memory at c0000000 (32-bit, prefetchable) [size=128M]
 Region 1: I/O ports at b800 [size=256]
 Region 2: Memory at cfef0000 (32-bit, non-prefetchable) [size=64K]
 Expansion ROM at cfec0000 [disabled] [size=128K]
 Capabilities: <available only to root>
00: 02 10 61 59 07 01 b0 02 01 00 00 03 08 40 80 00
10: 08 00 00 c0 01 b8 00 00 00 00 ef cf 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 58 14 18 40
30: 00 00 ec cf 58 00 00 00 00 00 00 00 00 01 08 00

0000:01:00.1 Display controller: ATI Technologies Inc RV280 [Radeon 9200] 
(Secondary) (rev 01)
 Subsystem: Giga-byte Technology: Unknown device 4019
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR- FastB2B-
 Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- 
<MAbort- >SERR- <PERR-
 Latency: 64 (2000ns min), Cache Line Size: 0x08 (32 bytes)
 Region 0: Memory at b8000000 (32-bit, prefetchable) [size=128M]
 Region 1: Memory at cfee0000 (32-bit, non-prefetchable) [size=64K]
 Capabilities: <available only to root>
00: 02 10 41 59 07 00 b0 02 01 00 80 03 08 40 00 00
10: 08 00 00 b8 00 00 ee cf 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 58 14 19 40
30: 00 00 00 00 50 00 00 00 00 00 00 00 ff 00 08 00


-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-23 21:49                                   ` Alan Chandler
@ 2004-11-26 23:39                                     ` Alan Chandler
  2004-11-29 17:29                                       ` Bill Davidsen
  2004-11-30  8:59                                     ` Alan Chandler
  1 sibling, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-11-26 23:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On Tuesday 23 November 2004 21:49, Alan Chandler wrote:

>
> Before, I thought my hardware was a little out of spec - now I think there
> is something else at play here.
>

Firstly, I think there might be another race condition like the one Alan Cox 
found.  I attach a patch below with the fix for that (against 2.6.10-rc2, an 
including Alan's patch)   I'm not 100% sure its necessary, but it seems fix a 
variation I have been seeing.

With it in place, and apart from the ongoing issue - see below, I have managed 
to remove the delay in drive_is_ready() altogether without any ill effects.

[my reading of the ATA spec is that 400ns is needed after reading the status 
reg before IRQ is removed, I had wondered whether it would be better to 
record the time here and then check whether we had used up the 400ns just 
before returning from the interrupt state]

> Nov 23 20:37:33 kanger kernel: ide-cd:cdrom_newpc_intr - cmd=0x0 stat=0x50
> ireason=3 len=2048 rq len=0
..
> Nov 23 20:37:33 kanger kernel: ide-cd:cdrom_newpc_intr - cmd=0x1b stat=0x58
> ireason=2 len=0 rq len=0

I think these two lines hold the crux of the problem.  They are the result of 
a printk in cdrom_newpc_intr just after reading the interrupt reason register 
and byte count registers.

However, I have been unable to get any closer as to why DRQ gets set.

The patch

--- ide-cd.c 2004-11-26 20:29:59.000000000 +0000
+++ ide-cd.patch.c 2004-11-26 20:51:23.000000000 +0000
@@ -890,8 +890,12 @@
   ide_execute_command(drive, WIN_PACKETCMD, handler, ATAPI_WAIT_PC, 
cdrom_timer_expiry);
   return ide_started;
  } else {
+  unsigned long flags; 
+  spin_lock_irqsave(&ide_lock, flags);
+  HWIF(drive)->OUTBSYNC(drive, WIN_PACKETCMD, IDE_COMMAND_REG);
+  ndelay(400);
+  spin_unlock_irqrestore(&ide_lock, flags);
   /* packet command */
-  HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
   return (*handler) (drive);
  }
 }
@@ -938,8 +942,12 @@
   cmd_len = ATAPI_MIN_CDB_BYTES;
 
  /* Send the command to the device. */
+ unsigned long flags; 
+ spin_lock_irqsave(&ide_lock, flags);
  HWIF(drive)->atapi_output_bytes(drive, rq->cmd, cmd_len);
-
+ ndelay(400);
+ spin_unlock_irqrestore(&ide_lock, flags);
+ 
  /* Start the DMA if need be */
  if (info->dma)
   hwif->dma_start(drive);

-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-26 23:39                                     ` Alan Chandler
@ 2004-11-29 17:29                                       ` Bill Davidsen
  0 siblings, 0 replies; 37+ messages in thread
From: Bill Davidsen @ 2004-11-29 17:29 UTC (permalink / raw)
  To: Alan Chandler; +Cc: Jens Axboe, linux-kernel

Alan Chandler wrote:
> On Tuesday 23 November 2004 21:49, Alan Chandler wrote:
> 
> 
>>Before, I thought my hardware was a little out of spec - now I think there
>>is something else at play here.
>>
> 
> 
> Firstly, I think there might be another race condition like the one Alan Cox 
> found.  I attach a patch below with the fix for that (against 2.6.10-rc2, an 
> including Alan's patch)   I'm not 100% sure its necessary, but it seems fix a 
> variation I have been seeing.
> 
> With it in place, and apart from the ongoing issue - see below, I have managed 
> to remove the delay in drive_is_ready() altogether without any ill effects.
> 
> [my reading of the ATA spec is that 400ns is needed after reading the status 
> reg before IRQ is removed, I had wondered whether it would be better to 
> record the time here and then check whether we had used up the 400ns just 
> before returning from the interrupt state]

The method is sound, I did some industrial control in a previous job and 
used a back-to-back timer in a similar way. Just before starting an 
operation I checked to see that it wasn't too soon and did a delay if so.

Depending on the path through the code, it may be easier to get right by 
just putting in the delay.

-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem
  2004-11-23 21:49                                   ` Alan Chandler
  2004-11-26 23:39                                     ` Alan Chandler
@ 2004-11-30  8:59                                     ` Alan Chandler
  2004-12-10 21:32                                       ` ide-cd problem revisited - more brainpower needed Alan Chandler
  1 sibling, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-11-30  8:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On Tuesday 23 November 2004 21:49, Alan Chandler wrote:
> On Tuesday 23 November 2004 14:51, Jens Axboe wrote:
> > On Tue, Nov 23 2004, Alan Chandler wrote:
> > > On Monday 22 November 2004 23:48, Alan Chandler wrote:
> > > ...
> > >
> > > > If I make the delay 600ns it works - I guess my hardware is a little
> > > > off spec.
...
> There is good and bad news related to 2.6.10-rc2
>
> The good news is that the acpi problem was the cause of the startup issues.
> adding pci=noacpi to the boot command line fixed that.
>
> The bad news is that with the delay at 800ns in drive_is_ready() I am
> getting the exact same symptoms I got with 2.6.9 before upping the delay to
> over 540ns.
...
>
> Firstly, the symptoms are the same between 2.6.9 and 2.6.10-rc2.  The halt
> seem to always be in exactly the same place. If it was a timing problem I
> would have thought it would have varied.
>
> Secondly, the command before seems to have an expectation of a 2048
> transfer rather than the 0 in the command, before the problem and then you
> get the strange DRQ=1 but 0 in the len register.
>
> Nov 23 20:37:33 kanger kernel: ide-cd:ide_do_rq_cdrom - cmd = 0x0
> Nov 23 20:37:33 kanger kernel: ide-cd:cdrom_newpc_intr - cmd=0x0 stat=0x50
> ireason=3 len=2048 rq len=0
> Nov 23 20:37:33 kanger kernel: ide-cd:ide_do_rq_cdrom - cmd = 0x1b
> Nov 23 20:37:33 kanger kernel: ide-cd:cdrom_newpc_intr - cmd=0x1b stat=0x58
> ireason=2 len=0 rq len=0
>

I have looked at this "every which way" to try and understand what is 
happening, and now think I have a handle on it.

The Test Unit Ready command (cmd=0x00) is the first command in the sequence 
that is issued with an expectation of no data being returned (rq len=0), but 
for some reason (which I don't know why) the device has decided it has data 
to send us (len=2048). I was confused for a long time because in normal 
circumstances cdrom_newpc_intr() is entered twice for a packet command with a 
data transfer attached, the first time with DRQ asserted and rq len == len, 
and once the transfer has taken place with DRQ not asserted and rq len having 
been set to 0 whilst len remains at its original value.  But in the case 
shown above, the first time cdrom_newpc_intr() is entered rq len (actually 
rq->data_len) is set to zero

In this case and (as far as I can see related to the packet commands) only 
this case, with my drive (CW078D), there is a delay between IRQ being 
asserted to say the command has been completed and DRQ being raised. (DRQ for 
the other cases seems to be asserted by the time of the interrupt).  

Once DRQ is asserted in this way, and until some command tries to read the 
data it wants to send us, it remains asserted.  That is why the start stop 
unit command (0x1b) appears to be the one that is failing.  In one trial I 
conducted I made this command into a no op - returning successfully, and the 
problem just full into the next command in sequence.

I was also experiencing different symptoms between 2.6.9 and 2.6.10-rc2, in 
that a change made to solve the problem in 2.6.9 (changing the delay in 
drive_is_ready to 600ns) did not appear to work in 2.6.10-rc2.  I haven't 
quite got to the bottom of this - but as the patch below shows, I have 
plugged what might be race conditions similar to the one identified by Alan 
Cox.  I do not fully understand the logic of the Alan's race condition, as 
from comments elsewhere in the code (and I really don't know how it really 
works) it says linux does not allow rentrant interrupts on the same unit - so 
I don't understand why the spinlocks need to be there, so my logic maybe 
totally spurious in this case - except it did make the difference between it 
working in 2.6.9 and getting it to work in 2.6.10-rc2.

 I did attempt in early tests I did with 2.6.9 to determine the accurately the 
size of the delay needed, it appeared to be around 540ns (400ns was not 
enough).  I have done all recent tests with 2.6.10-rc2 and 400ns is not 
enough 600ns is.  I have searched all over the ATAPI spec, but I can not see 
any really definitive statement on DRQ assertion in this case.  There is of 
course the 400ns delay in removing IRQ after either reading the status 
register or writing to the command register (clause 5.2.9), which the code 
seems to interpret as 400ns after either occurrence, but which my reading of 
the spec would be 400ns after the first occurence of one of these.  As I 
indicated in a previous note on this, as we always read the status register 
at the start of interrupt processing (in drive_is_ready), perhaps the 
solution is to store jiffies at this point and then delay 400ns - difference 
between jiffies now and as we exit ide_intr().

The other place where any delay is mentioned is when transitioning from bus 
idle state to various command state following issue of a command. Again the 
400ns is mentioned.  In the specific case of the packet command, this delay 
seems to need to be when the packet command is written - and before the 
command packet is transfered.  The only delay mentioned after that point is 
the one pio cycle (where it recommends reading the alt status reg) before 
checking status.  Given that we read the status twice (drive_is_ready and 
cdrom_decode_status) it appears we fufil this rule.

In fact, unless someone can point me at another place in the spec, the 
implication is that DRQ will be correct at the time of the interrupt and two 
reads of the status register that we do when entering cdrom_newpc_intr.  So I 
Have to conclude that my CD is operating outside of spec in this case.

This patch below is not a definitive answer to the issue, rather an 
illustration of the test envionment I was using to show things working.  You 
will see from the patches in drivers/ide/ide-iops.c that I have removed most 
of the timing delays, mainly to get a real handle on what was needed in 
ide-cd.c.  I never had any problem with delays apart from the one described 
above 

Index: drivers/ide/ide-iops.c
===================================================================
--- drivers/ide/ide-iops.c (.../vendor/2.6.10-rc2) (revision 14)
+++ drivers/ide/ide-iops.c (.../akc/delay) (revision 14)
@@ -476,10 +476,6 @@
  if (drive->waiting_for_dma)
   return hwif->ide_dma_test_irq(drive);
 
-#if 0
- /* need to guarantee 400ns since last command was issued */
- udelay(1);
-#endif
 
 #ifdef CONFIG_IDEPCI_SHARE_IRQ
  /*
@@ -560,7 +556,6 @@
   return 1;
  }
 
- udelay(1); /* spec allows drive 400ns to assert "BUSY" */
  if ((stat = hwif->INB(IDE_STATUS_REG)) & BUSY_STAT) {
   local_irq_set(flags);
   timeout += jiffies;
@@ -589,11 +584,13 @@
   * rather than expensively fail things immediately.
   * This fix courtesy of Matthew Faupel & Niccolo Rigacci.
   */
- for (i = 0; i < 10; i++) {
-  udelay(1);
+ 
+ /* take out this fix temporarily to test out what is happening without 
distractions*/
+/* for (i = 0; i < 10; i++) {
+  udelay(1);*/
   if (OK_STAT((stat = hwif->INB(IDE_STATUS_REG)), good, bad))
    return 0;
- }
+/* }*/
  *startstop = DRIVER(drive)->error(drive, "status error", stat);
  return 1;
 }
Index: drivers/ide/ide-cd.c
===================================================================
--- drivers/ide/ide-cd.c (.../vendor/2.6.10-rc2) (revision 14)
+++ drivers/ide/ide-cd.c (.../akc/delay) (revision 14)
@@ -890,8 +890,13 @@
   ide_execute_command(drive, WIN_PACKETCMD, handler, ATAPI_WAIT_PC, 
cdrom_timer_expiry);
   return ide_started;
  } else {
-  /* packet command */
-  HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
+  unsigned long flags;
+
+                /* packet command */
+  spin_lock_irqsave(&ide_lock, flags);
+  HWIF(drive)->OUTBSYNC(drive, WIN_PACKETCMD, IDE_COMMAND_REG);
+  ndelay(400);
+  spin_unlock_irqrestore(&ide_lock, flags);
   return (*handler) (drive);
  }
 }
@@ -938,8 +943,12 @@
   cmd_len = ATAPI_MIN_CDB_BYTES;
 
  /* Send the command to the device. */
+ unsigned long flags; 
+ spin_lock_irqsave(&ide_lock, flags);
  HWIF(drive)->atapi_output_bytes(drive, rq->cmd, cmd_len);
-
+ ndelay(400);
+ spin_unlock_irqrestore(&ide_lock, flags);
+ 
  /* Start the DMA if need be */
  if (info->dma)
   hwif->dma_start(drive);
@@ -1649,6 +1658,8 @@
  xfer_func_t *xferfunc;
  unsigned long flags;
 
+ /* delay here enough to ensure the command has had time to complete and 
decide if its going to set DRQ */
+ ndelay(600);
  /* Check for errors. */
  dma_error = 0;
  dma = info->dma;
@@ -1665,7 +1676,6 @@
   */
  if (dma) {
   if (dma_error) {
-   printk("ide-cd: dma error\n");
    __ide_dma_off(drive);
    return DRIVER(drive)->error(drive, "dma error", stat);
   }
@@ -1737,10 +1747,19 @@
   if (blen > thislen)
    blen = thislen;
 
-  xferfunc(drive, ptr, blen);
-
   thislen -= blen;
   len -= blen;
+  
+  if((thislen <=0) && (len<=0)) {
+   unsigned long flags;
+   /* this is the last of this output */
+   spin_lock_irqsave(&ide_lock, flags);
+   xferfunc(drive, ptr, blen);
+   ndelay (400);
+   spin_unlock_irqrestore(&ide_lock, flags);
+  } else
+   xferfunc(drive,ptr,blen);
+  
   rq->data_len -= blen;
 
   if (rq->bio)
@@ -1748,7 +1767,7 @@
   else
    rq->data += blen;
  }
-
+ 
  /*
   * pad, if necessary
   */
@@ -1756,10 +1775,19 @@
   while (len > 0) {
    int pad = 0;
 
-   xferfunc(drive, &pad, sizeof(pad));
    len -= sizeof(pad);
+   if (len <= 0) {
+    unsigned long flags;
+
+    /* last output */
+    spin_lock_irqsave(&ide_lock,flags);
+    xferfunc(drive, &pad, sizeof(pad));
+    ndelay(400);
+    spin_unlock_irqrestore(&ide_lock,flags);
+   } else
+    xferfunc(drive, &pad, sizeof(pad));
   }
- }
+ } 
 
  if (HWGROUP(drive)->handler != NULL)
   BUG();



-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* ide-cd problem revisited - more brainpower needed
  2004-11-30  8:59                                     ` Alan Chandler
@ 2004-12-10 21:32                                       ` Alan Chandler
  2004-12-10 23:14                                         ` Alan Cox
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-12-10 21:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jens Axboe

A couple of weeks ago, I posted in relation to a problem I have burning CDs. 

Ever since, I have solidly been looking for what might be the cause of the 
problem, and although I have identified what goes wrong, I can not see why 
that should be the case or whether it can be corrected.  Hence the request 
here.
 
All the above output is a result of the command (as root) using linux 
2.6.10-rc2 (with acpi turned off to avoid that bug)·

cdrecord -dev /dev/hdc blank=fast

The device is identified as
CW078D CD-R/RW 
a generic no name rewriter.

There is also a DVD reader on this ide channel (/dev/hdd) but it is not doing 
anything through out the testing.  I have tried removing it, 2.6.9 still 
gives the same symptoms (although I have no printk's in it), 2.6.10-rc panics 
during boot.
 ...

On Tuesday 30 November 2004 08:59, Alan Chandler wrote:
> On Tuesday 23 November 2004 21:49, Alan Chandler wrote:
>
> > Firstly, the symptoms are the same between 2.6.9 and 2.6.10-rc2.  The
> > halt seem to always be in exactly the same place. If it was a timing
> > problem I would have thought it would have varied.
> >
> > Secondly, the command before seems to have an expectation of a 2048
> > transfer rather than the 0 in the command, before the problem and then
> > you get the strange DRQ=1 but 0 in the len register.
> >
> > Nov 23 20:37:33 kanger kernel: ide-cd:ide_do_rq_cdrom - cmd = 0x0
> > Nov 23 20:37:33 kanger kernel: ide-cd:cdrom_newpc_intr - cmd=0x0
> > stat=0x50 ireason=3 len=2048 rq len=0
> > Nov 23 20:37:33 kanger kernel: ide-cd:ide_do_rq_cdrom - cmd = 0x1b
> > Nov 23 20:37:33 kanger kernel: ide-cd:cdrom_newpc_intr - cmd=0x1b
> > stat=0x58 ireason=2 len=0 rq len=0

For the past couple of weeks I have been trying different things with printk 
statements to get to the bottom of what is actually happening - I have now 
got to the point where I do not know how to get more out of it, and the 
device is behaving so strangely that I am not sure how it ever worked - 
although it does under linux 2.4 series.

The lines above are misleading - because as can be seen from below, interrupts 
occur after the command has supposedly finished - and the next request has 
already arrived.  I have added a msleep(10) in the back-end of sg_io so that 
there is a short delay before cdrecord gets control back, and its therefore 
possible to see the interrupts beyond the end of processing and before the 
next command starts.  Before I did that, I think the commands were overlapping 
in some way - and therefore the symptoms were not visable immediately.

Comments are inserted into the output to my log from printk statements below.  
the "time" is the current value of "jiffies".  I have also removed some of 
the "bodges" in ide_wait_stat to loop for 10uS - Does not appear necessary 
and I did not want to confuse the issue with them.  Alan Cox's patch and 
various other ndelay(400) statements are in to ensure that nowhere can a 
statement to clear INTR re-interrupt by the command exiting too fast (although 
in most cases it appears to make no difference if they are there or not). 


Dec 10 20:00:54 kanger kernel: scsi-ioctl: SG_IO
Dec 10 20:00:54 kanger kernel: ide-cd:ide_do_rw_cdrom
Dec 10 20:00:54 kanger kernel: ide-cd:ide_do_rw_cdrom - do block pc
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_start_packet_command with feature 
reg=0x0

I check feature reg here which indicates whether a dma (or in this case not) 
is about to happen

Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_do_newpc_cont at time=4294941469 
for cmd=0x0

This is routine called after issuing the PACKET command to load the packet 
into the device.

Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_newpc_intr done at time=4294941469 
with cmd=0x0 stat=0x50, req len=0, len=0, ireason=3

This is the interrupt at command completion - showing the length expected in 
the request and the len read from the device at this point.  Its after 
cdrom_decode_status has read and checked that there is not error.  "stat" is 
the value returned in the status register.

Dec 10 20:00:54 kanger kernel: scsi_ioctl:sg_io completed in time 0

This is back in the sg_io route, with the calculated time (in jiffies) to 
execute the command.


Dec 10 20:00:54 kanger kernel: scsi-ioctl: SG_IO
Dec 10 20:00:54 kanger kernel: ide-cd:ide_do_rw_cdrom
Dec 10 20:00:54 kanger kernel: ide-cd:ide_do_rw_cdrom - do block pc
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_start_packet_command with feature 
reg=0x0
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_do_newpc_cont at time=4294941480 
for cmd=0x5a
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_newpc_intr done at time=4294941481 
with cmd=0x5a stat=0x58, req len=60, len=60, ireason=2
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_newpc_intr done at time=4294941481 
with cmd=0x5a stat=0x50, req len=0, len=60, ireason=3
Dec 10 20:00:54 kanger kernel: scsi_ioctl:sg_io completed in time 1

This is similar, with a pio data transfer. Note how cdrom_newpc_intr is called 
twice, once when the device is ready for the transfer, and then at the end of 
the transfer (command complete).

Dec 10 20:00:54 kanger kernel: scsi-ioctl: SG_IO
Dec 10 20:00:54 kanger kernel: ide-cd:ide_do_rw_cdrom
Dec 10 20:00:54 kanger kernel: ide-cd:ide_do_rw_cdrom - do block pc
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_start_packet_command with feature 
reg=0x0
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_do_newpc_cont at time=4294941492 
for cmd=0x0
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_newpc_intr done at time=4294941492 
with cmd=0x0 stat=0x50, req len=0, len=0, ireason=3
Dec 10 20:00:54 kanger kernel: scsi_ioctl:sg_io completed in time 0
Dec 10 20:00:54 kanger kernel: scsi-ioctl: SG_IO
Dec 10 20:00:54 kanger kernel: ide-cd:ide_do_rw_cdrom
Dec 10 20:00:54 kanger kernel: ide-cd:ide_do_rw_cdrom - do block pc
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_start_packet_command with feature 
reg=0x0
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_do_newpc_cont at time=4294941503 
for cmd=0x0
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_newpc_intr done at time=4294941503 
with cmd=0x0 stat=0x50, req len=0, len=0, ireason=3
Dec 10 20:00:54 kanger kernel: scsi_ioctl:sg_io completed in time 0
Dec 10 20:00:54 kanger kernel: scsi-ioctl: SG_IO
Dec 10 20:00:54 kanger kernel: ide-cd:ide_do_rw_cdrom
Dec 10 20:00:54 kanger kernel: ide-cd:ide_do_rw_cdrom - do block pc
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_start_packet_command with feature 
reg=0x0
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_do_newpc_cont at time=4294941514 
for cmd=0x55
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_newpc_intr done at time=4294941514 
with cmd=0x55 stat=0x58, req len=60, len=60, ireason=0
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_newpc_intr done at time=4294941516 
with cmd=0x55 stat=0x50, req len=0, len=60, ireason=3
Dec 10 20:00:54 kanger kernel: scsi_ioctl:sg_io completed in time 2
Dec 10 20:00:54 kanger kernel: scsi-ioctl: SG_IO
Dec 10 20:00:54 kanger kernel: ide-cd:ide_do_rw_cdrom
Dec 10 20:00:54 kanger kernel: ide-cd:ide_do_rw_cdrom - do block pc
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_start_packet_command with feature 
reg=0x0
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_do_newpc_cont at time=4294941527 
for cmd=0x3c
Dec 10 20:00:54 kanger kernel: cmd[0]=0x3c
Dec 10 20:00:54 kanger kernel: cmd[1]=0x0
Dec 10 20:00:54 kanger kernel: cmd[2]=0x0
Dec 10 20:00:54 kanger kernel: cmd[3]=0x0
Dec 10 20:00:54 kanger kernel: cmd[4]=0x0
Dec 10 20:00:54 kanger kernel: cmd[5]=0x0
Dec 10 20:00:54 kanger kernel: cmd[6]=0x0
Dec 10 20:00:54 kanger kernel: cmd[7]=0x0
Dec 10 20:00:54 kanger kernel: cmd[8]=0x4
Dec 10 20:00:54 kanger kernel: cmd[9]=0x0

This is the first time something goes wrong - and its always on this READ 
BUFFER command (0x3c).  I therefore dump out the command at this stage - 
showing the internal length of 4 bytes, matching the len below.

Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_newpc_intr done at time=4294941527 
with cmd=0x3c stat=0x58, req len=4, len=4, ireason=2
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_newpc_intr done at time=4294941527 
with cmd=0x3c stat=0x50, req len=0, len=4, ireason=3
Dec 10 20:00:54 kanger kernel: scsi_ioctl:sg_io completed in time 0

IO should be complete here - its the standard termination - but ...

Dec 10 20:00:54 kanger kernel: ide-io:ide_intr - seems we interrupted 
ourselves at time=4294941527, stat=0x58, ireason=0x2,len=4

This printk detects inside ide-io:ide_intr where it checks to see if a null 
handler is defined.  It is, and so tries to wack the status register.  I have 
added in code to not only do that, but if "ireason==2" to read in the data 
register "len" times (If I do not do this, the next command fails in 
ide_wait_stat failing to get a good status).  There is also an added 
ndelay(400) here to ensure the interrupt does not remain pending after the 
return.

Dec 10 20:00:54 kanger kernel: ide-io:ide_intr - seems we interrupted 
ourselves at time=4294941527, stat=0x50, ireason=0x3,len=4
Dec 10 20:00:54 kanger kernel: ide-io:ide_intr - seems we interrupted 
ourselves at time=4294941527, stat=0x50, ireason=0x3,len=4

I am not sure why there are two more interrupts - in case I hadn't got a long 
enough delay I increased it from ndelay(400) to ndelay(800) but it made no 
difference.

As you can all of these at at the same jiffy (ie within 1 millisec) of each 
other.  bits 6:5 of word 0 of the identify data is set to 10 (50uS) for 
length of time for command to complete.

However, eventually the command is seen to complete....

Dec 10 20:00:54 kanger kernel: scsi-ioctl: SG_IO
Dec 10 20:00:54 kanger kernel: ide-cd:ide_do_rw_cdrom
Dec 10 20:00:54 kanger kernel: ide-cd:ide_do_rw_cdrom - do block pc
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_start_packet_command with feature 
reg=0x1

The first time with dma.


Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_do_newpc_cont at time=4294941538 
for cmd=0x3c
Dec 10 20:00:54 kanger kernel: cmd[0]=0x3c
Dec 10 20:00:54 kanger kernel: cmd[1]=0x0
Dec 10 20:00:54 kanger kernel: cmd[2]=0x0
Dec 10 20:00:54 kanger kernel: cmd[3]=0x0
Dec 10 20:00:54 kanger kernel: cmd[4]=0x0
Dec 10 20:00:54 kanger kernel: cmd[5]=0x0
Dec 10 20:00:54 kanger kernel: cmd[6]=0x0
Dec 10 20:00:54 kanger kernel: cmd[7]=0xfc
Dec 10 20:00:54 kanger kernel: cmd[8]=0x0
Dec 10 20:00:54 kanger kernel: cmd[9]=0x0

With a lot larger transfer

Dec 10 20:00:54 kanger kernel: ide-dma:ide_dma_start - with command 0x8
Dec 10 20:00:54 kanger kernel: ide-cd:cdrom_newpc_intr - assuming just 
finished dma at time=4294941538 with cmd=0x3c, rq len=64512, stat=0x50
Dec 10 20:00:54 kanger kernel: scsi_ioctl:sg_io completed in time 0
Dec 10 20:00:59 kanger kernel: scsi-ioctl: SG_IO
Dec 10 20:00:59 kanger kernel: ide-iops:ide_wait_stat - was busy when I 
arrived

What this means is that when ide_wait_stat was entered the BSY bit was set on 
the first time the routine is entered.  Given the timing, it has in fact had  
BSY showing with no sign of it clearing for a full 5 secs. 

Dec 10 20:00:59 kanger kernel: hdc: status timeout: status=0xd0 { Busy }
Dec 10 20:00:59 kanger kernel: hdc: status timeout: error=0xd0LastFailedSense 
0x0d
Dec 10 20:00:59 kanger kernel: hdc: DMA disabled
Dec 10 20:00:59 kanger kernel: hdc: drive not ready for command

And so we eventually fail completely.

This pattern is fully predictable - its not a timing issue that varies from 
run to run - every time exactly the same thing happens.

So what do I do next?  Why would the hardware work this way - is it a bug in 
the firmware?, is there a subtle timing problem causing interrupts to 
re-enter... should I just junk the hardware and start again?  Help!

-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem revisited - more brainpower needed
  2004-12-10 21:32                                       ` ide-cd problem revisited - more brainpower needed Alan Chandler
@ 2004-12-10 23:14                                         ` Alan Cox
  2004-12-12  0:17                                           ` Alan Chandler
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Cox @ 2004-12-10 23:14 UTC (permalink / raw)
  To: Alan Chandler; +Cc: Linux Kernel Mailing List, Jens Axboe

On Gwe, 2004-12-10 at 21:32, Alan Chandler wrote:
> All the above output is a result of the command (as root) using linux 
> 2.6.10-rc2 (with acpi turned off to avoid that bug)·
> 

Is local apic enabled ?

> So what do I do next?  Why would the hardware work this way - is it a bug in 
> the firmware?, is there a subtle timing problem causing interrupts to 
> re-enter... should I just junk the hardware and start again?  Help!

A purely armwaving guess of the moment is that if the IRQ routing is
confused over edge versus level trigger then you would see extra
interrupts. Does the drive work in another PC (I forget if you tried
that)


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem revisited - more brainpower needed
  2004-12-10 23:14                                         ` Alan Cox
@ 2004-12-12  0:17                                           ` Alan Chandler
  2004-12-12 11:39                                             ` Alan Cox
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-12-12  0:17 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List, Jens Axboe

On Friday 10 December 2004 23:14, Alan Cox wrote:
> On Gwe, 2004-12-10 at 21:32, Alan Chandler wrote:
> > All the above output is a result of the command (as root) using linux
> > 2.6.10-rc2 (with acpi turned off to avoid that bug)·
>
> Is local apic enabled ?

I don't understand what you mean

>
> > So what do I do next?  Why would the hardware work this way - is it a bug
> > in the firmware?, is there a subtle timing problem causing interrupts to
> > re-enter... should I just junk the hardware and start again?  Help!
>
> A purely armwaving guess of the moment is that if the IRQ routing is
> confused over edge versus level trigger then you would see extra
> interrupts. Does the drive work in another PC (I forget if you tried
> that)

I am doing all this at home on a personal level, so I am somewhat restricted. 

My normal CDRW machine is an Athlon 2100+ running on an Elite K7S5A 
motherboard.

I have a server running Celeron 1.7G and PlatiniX (Intel 845 Chipset) 
motherboard.  I moved the CDRW over to it and tried it. I didn't have time to 
build a special version of the system for it, and its running debian sarge 
with 2.6.8.1.  It failed to work, the symptoms were similar but slightly 
different.

-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem revisited - more brainpower needed
  2004-12-12  0:17                                           ` Alan Chandler
@ 2004-12-12 11:39                                             ` Alan Cox
  2004-12-12 13:34                                               ` Alan Chandler
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Cox @ 2004-12-12 11:39 UTC (permalink / raw)
  To: Alan Chandler; +Cc: Linux Kernel Mailing List, Jens Axboe

On Sul, 2004-12-12 at 00:17, Alan Chandler wrote:
> My normal CDRW machine is an Athlon 2100+ running on an Elite K7S5A 
> motherboard.
> 
> I have a server running Celeron 1.7G and PlatiniX (Intel 845 Chipset) 
> motherboard.  I moved the CDRW over to it and tried it. I didn't have time to 
> build a special version of the system for it, and its running debian sarge 
> with 2.6.8.1.  It failed to work, the symptoms were similar but slightly 
> different.

Thanks ok so it moves with the drive. I'm beginning to wonder if it is
just a dud drive.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem revisited - more brainpower needed
  2004-12-12 11:39                                             ` Alan Cox
@ 2004-12-12 13:34                                               ` Alan Chandler
  2004-12-14  0:20                                                 ` Alan Chandler
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-12-12 13:34 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List, Jens Axboe

On Sunday 12 December 2004 11:39, Alan Cox wrote:

> Thanks ok so it moves with the drive. I'm beginning to wonder if it is
> just a dud drive.

I do too and am almost ready to throw in the towel (and I seem to be almost 
unique in experiencing these problems) - except

1) There is an open bug report on debian (#265747) of someone with the same 
model of drive having problems (and to which I added myself).  Later down the 
bug discussion someone else has chipped in with a Cyberdrive (different 
model) with problems.



3) This is absolutely consistent every run.  Its the READ BUFFER command where 
it first occurs (without DMA), and this is not the first time that data has 
been transfered (via pio).  However, under both Windows XP and linux 2.4 
(using the ide-scsi module) the drive works perfectly.

[Thinks - maybe I should spend some time looking at the 2.4 code and 
understanding the differences]

Yesterday, I eventually got to the Cyberdrive Web site (it seems to have been 
unavailable for a while) and downloaded the latest firmware upgrade.  The 
upgrade worked perfectly (had to do it under windows) and it made absolutely 
no difference.




-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem revisited - more brainpower needed
  2004-12-12 13:34                                               ` Alan Chandler
@ 2004-12-14  0:20                                                 ` Alan Chandler
  2004-12-16 15:56                                                   ` Bill Davidsen
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Chandler @ 2004-12-14  0:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alan Cox, Jens Axboe

On Sunday 12 December 2004 13:34, Alan Chandler wrote:
> On Sunday 12 December 2004 11:39, Alan Cox wrote:
> > Thanks ok so it moves with the drive. I'm beginning to wonder if it is
> > just a dud drive.
>
> I do too and am almost ready to throw in the towel (and I seem to be almost
> unique in experiencing these problems) - except

Towel now thrown.

...

> and
> linux 2.4 (using the ide-scsi module) the drive works perfectly.

Drive is obviously degrading, I tried it "one more time" today with 2.4, and 
it only just managed to work.  It managed to finish the task, but there were 
several resets and error messages very similar to the ones in 2.6.

My apologies to others time that I might have wasted, although it personally 
was a good learning experience on how the kernel works.
-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem revisited - more brainpower needed
  2004-12-14  0:20                                                 ` Alan Chandler
@ 2004-12-16 15:56                                                   ` Bill Davidsen
  2004-12-17 23:59                                                     ` Alan Chandler
  0 siblings, 1 reply; 37+ messages in thread
From: Bill Davidsen @ 2004-12-16 15:56 UTC (permalink / raw)
  To: Alan Chandler; +Cc: linux-kernel, Alan Cox, Jens Axboe

Alan Chandler wrote:
> On Sunday 12 December 2004 13:34, Alan Chandler wrote:
> 
>>On Sunday 12 December 2004 11:39, Alan Cox wrote:
>>
>>>Thanks ok so it moves with the drive. I'm beginning to wonder if it is
>>>just a dud drive.
>>
>>I do too and am almost ready to throw in the towel (and I seem to be almost
>>unique in experiencing these problems) - except
> 
> 
> Towel now thrown.
> 
> ...
> 
> 
>>and
>>linux 2.4 (using the ide-scsi module) the drive works perfectly.
> 
> 
> Drive is obviously degrading, I tried it "one more time" today with 2.4, and 
> it only just managed to work.  It managed to finish the task, but there were 
> several resets and error messages very similar to the ones in 2.6.

Is it? Or did the firmware "upgrade" make it worse? Can you reflash back 
to the original firmware and see if 2.4 works correctly?

> My apologies to others time that I might have wasted, although it personally 
> was a good learning experience on how the kernel works.


-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: ide-cd problem revisited - more brainpower needed
  2004-12-16 15:56                                                   ` Bill Davidsen
@ 2004-12-17 23:59                                                     ` Alan Chandler
  0 siblings, 0 replies; 37+ messages in thread
From: Alan Chandler @ 2004-12-17 23:59 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-kernel, Alan Cox, Jens Axboe

On Thursday 16 December 2004 15:56, Bill Davidsen wrote:

>
> Is it? Or did the firmware "upgrade" make it worse? Can you reflash back
> to the original firmware and see if 2.4 works correctly?

Unfortunately not - the version I 'upgraded' from was not on their web site, 
so I don't have a way back.

I was a thought that crossed my mind, although the detailed problem of 
multiple interrupts after apparently successfully completing the 0x3c command 
still occurred in exactly the same way and place when using 2.6.10-rc3.



-- 
Alan Chandler
alan@chandlerfamily.org.uk
First they ignore you, then they laugh at you,
 then they fight you, then you win. --Gandhi

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2004-12-17 23:59 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-20 18:42 ide-cd problem Alan Chandler
2004-11-20 19:47 ` Jens Axboe
2004-11-21  0:53   ` Alan Chandler
2004-11-21  8:56     ` Jens Axboe
2004-11-21 10:25       ` Alan Chandler
2004-11-21 16:13         ` Alan Chandler
2004-11-22  7:52           ` Alan Chandler
2004-11-22  8:01             ` Jens Axboe
2004-11-22 10:30               ` Alan Chandler
2004-11-22 10:51                 ` Jens Axboe
2004-11-22 11:29                   ` Alan Chandler
2004-11-22 11:31                     ` Jens Axboe
2004-11-22 12:53                       ` Alan Chandler
2004-11-22 13:02                         ` Jens Axboe
2004-11-22 19:19                           ` Alan Chandler
2004-11-22 23:48                             ` Alan Chandler
2004-11-23  7:13                               ` Alan Chandler
2004-11-23 14:51                                 ` Jens Axboe
2004-11-23 21:49                                   ` Alan Chandler
2004-11-26 23:39                                     ` Alan Chandler
2004-11-29 17:29                                       ` Bill Davidsen
2004-11-30  8:59                                     ` Alan Chandler
2004-12-10 21:32                                       ` ide-cd problem revisited - more brainpower needed Alan Chandler
2004-12-10 23:14                                         ` Alan Cox
2004-12-12  0:17                                           ` Alan Chandler
2004-12-12 11:39                                             ` Alan Cox
2004-12-12 13:34                                               ` Alan Chandler
2004-12-14  0:20                                                 ` Alan Chandler
2004-12-16 15:56                                                   ` Bill Davidsen
2004-12-17 23:59                                                     ` Alan Chandler
2004-11-24 23:19                           ` ide-cd problem Alan Cox
2004-11-25 15:29                             ` Jens Axboe
2004-11-25 16:25                               ` Alan Cox
2004-11-25 18:12                                 ` Jens Axboe
2004-11-25 18:45                                   ` Alan Chandler
2004-11-23 18:34               ` Jeff Garzik
2004-11-23 19:13                 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).