* Re: OOPS in idescsi_end_request
@ 2003-01-23 8:52 Andrey Borzenkov
2003-01-23 9:19 ` Strong kernel lock with linux-2.5.59 : futex in Huge Pages dada1
2003-01-24 3:15 ` OOPS in idescsi_end_request Brian King
0 siblings, 2 replies; 9+ messages in thread
From: Andrey Borzenkov @ 2003-01-23 8:52 UTC (permalink / raw)
To: James Stevenson, Brian King; +Cc: linux-kernel
As long as we cannot easily abort IDE request (correct me if I am wrong) the workaround seems to be to mark current request as aborted in idescsi_abort and ignore it later in idescsi_end_request, i.e. something like (with new flag PC_ABORTED)
struct request *rq = HWGROUP(idescsi_drives[cmd->target])->rq;
idescsi_pc_t *pc = (idescsi_pc_t *) rq->buffer;
pc->flags |= PC_ABORTED;
and later on assume we can ignore SCSI layer completely in this case in idescsi_end_request, just do general cleanup.
If you can reliably reproduce the problem you could give it a try.
Anybody sees yet another race condition here? :))
-andrey
> While burning a CD tonight I ended up taking an oops on my system. I had
> the lkcd patch applied to my 2.4.19 kernel, so I was able to look at the
> oops after my system rebooted. After digging into it a little and
> looking at the ide-scsi code I think I found the problem but am not
> sure. How can idescsi_reset simply return SCSI_RESET_SUCCESS to the scsi
> mid layer? I think what is happening is that a command times out,
> idescsi_abort is called, which returns SCSI_ABORT_SNOOZE. Later on
> idescsi_reset gets called, which returns SCSI_RESET_SUCCESS. At this
> point the scsi mid-layer owns the scsi_cmnd and returns the failure back
> up the chain. Later on, the command gets run through
> idescsi_end_request, which then tries to access the scsi_cmnd structure
> which is it no longer owns.
>
> Any help is appreciated. I have a complete lkcd dump of the failure if
> anyone would like more information...
>
> -Brian King
>
>
> Here is the last bit in the log buffer:
>
> <4>scsi : aborting command due to timeout : pid 2534304, scsi0,
> channel 0, id 1, lun 0 Write (10) 00 00 01 1e 91 00 00 1b 00
> <4>hdk: timeout waiting for DMA
> <4>ide_dmaproc: chipset supported ide_dma_timeout func only: 14
> <4>hdk: status timeout: status=0xd8 { Busy }
> <4>hdk: drive not ready for command
> <4>hdk: ATAPI reset complete
> <4>hdk: irq timeout: status=0x80 { Busy }
> <4>hdk: ATAPI reset complete
> <4>hdk: irq timeout: status=0x80 { Busy }
> <1>Unable to handle kernel NULL pointer dereference at virtual
> address 00000184
> <4> printing eip:
> <4>e0fd22f1
> <1>*pde = 00000000
> <4>Oops: 0002
> <4>CPU: 0
> <4>EIP: 0010:[<e0fd22f1>] Tainted: PF
> <4>EFLAGS: 00010046
> <4>eax: 00000000 ebx: 00000000 ecx: dfef8000 edx: c75bcbc0
> <4>esi: 00000080 edi: c0491938 ebp: d5908000 esp: c0435ea4
> <4>ds: 0018 es: 0018 ss: 0018
> <4>Process swapper (pid: 0, stackpage=c0435000)
> <4>Stack: c0491938 00000000 00000000 c0491938 00000088 000001f4
> c03349e2 c75bcbc0
> <4> ce0a3b80 c0491938 00000080 00000080 c75bcbc0 c0222d6c
> 00000000 c1671580
> <4> 00000080 c04918f4 c0491938 c0434000 c1671580 e0fd2550
> c0223b30 c0491938
> <4>Call Trace: [<c0222d6c>] [<e0fd2550>] [<c0223b30>]
> [<c0223990>] [<c0127af0>]
> <4> [<c01233d4>] [<c01232a6>] [<c01230ed>] [<c010a97f>]
> [<c010d173>] [<c0106f80>]
> <4> [<c0106fa3>] [<c0107012>] [<c0105000>]
> <4>
> <4>Code: c7 80 84 01 00 00 00 00 07 00 75 72 9c 5e fa bb 00 e0 ff ff
>
>
> From lkcd:
>
> ================================================================
> STACK TRACE FOR TASK: 0xc0434000 (swapper)
>
> 0 [ide-scsi]idescsi_end_request+129 [0xe0fd22f1]
> TRACE ERROR 0x800000000
^ permalink raw reply [flat|nested] 9+ messages in thread
* Strong kernel lock with linux-2.5.59 : futex in Huge Pages
2003-01-23 8:52 OOPS in idescsi_end_request Andrey Borzenkov
@ 2003-01-23 9:19 ` dada1
2003-01-23 9:41 ` Andrew Morton
2003-01-24 3:15 ` OOPS in idescsi_end_request Brian King
1 sibling, 1 reply; 9+ messages in thread
From: dada1 @ 2003-01-23 9:19 UTC (permalink / raw)
To: linux-kernel
Hello
I found a way to lock a linux-2.5.59 in all cases, in using futexes landing
in a HugeTLB page.
You need to be root to be able to obtain HugePages (or CAP_IPC_LOCK
capability)
I suspect that the kernel/futex.c:__pin_page(unsigned long addr) or
mm/memory.c:follow_page() are not HugeTLB page aware.
How you can reproduce it. (dont do it of course, unless you really want to
debug the thing)
# grep TLB .config
CONFIG_HUGETLB_PAGE=y
CONFIG_HUGETLBFS=y
# mount -t hugetlbfs none /huge
# echo 10 > /proc/sys/vm/nr_hugepages
# ./prog
Before futex() call
*********** HANG *********
# cat prog.c
#include <sys/mman.h>
#include <asm/unistd.h>
#include <errno.h>
#ifndef __NR_futex
#define __NR_futex 240
#endif
_syscall4(int, futex, unsigned long, uaddr, int, op, int, val, struct
timespec *,utime)
res = futex((unsigned long)ptr, /*FUTEX_WAIT*/0, 3, 0) ;
main(int argc, char *argv[])
{
int res ;
int fd = open("/huge/futex_bug", O_RDWR | O_CREAT, 0644) ;
char *ptr ;
if (fd == -1) { perror("/huge/futex_bug") ; exit(1);}
ptr = mmap(0x30000000, 4*1024*1024, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd,
0) ;
if (ptr == -1) { perror("mmap") ; exit(1);}
*(int *)ptr = 2 ; /* init the futex with value 2 */
printf("Before futex() call\n") ;
res = futex((unsigned long)ptr, /*FUTEX_WAIT*/0, 3, 0) ;
printf("futex->%d errno=%d\n", res, errno) ;
}
# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 6
cpu MHz : 801.570
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr sse
bogomips : 1585.15
Thanks
Eric Dumazet
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Strong kernel lock with linux-2.5.59 : futex in Huge Pages
2003-01-23 9:19 ` Strong kernel lock with linux-2.5.59 : futex in Huge Pages dada1
@ 2003-01-23 9:41 ` Andrew Morton
0 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2003-01-23 9:41 UTC (permalink / raw)
To: dada1; +Cc: linux-kernel
"dada1" <dada1@cosmosbay.com> wrote:
>
> Hello
>
> I found a way to lock a linux-2.5.59 in all cases, in using futexes landing
> in a HugeTLB page.
>
> You need to be root to be able to obtain HugePages (or CAP_IPC_LOCK
> capability)
>
> I suspect that the kernel/futex.c:__pin_page(unsigned long addr) or
> mm/memory.c:follow_page() are not HugeTLB page aware.
>
> How you can reproduce it. (dont do it of course, unless you really want to
> debug the thing)
Yup, futexes and direct-io against hugepages are bust. I was going to fix
that up this week, but got distracted.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: OOPS in idescsi_end_request
2003-01-23 8:52 OOPS in idescsi_end_request Andrey Borzenkov
2003-01-23 9:19 ` Strong kernel lock with linux-2.5.59 : futex in Huge Pages dada1
@ 2003-01-24 3:15 ` Brian King
2003-01-24 6:21 ` Re[2]: " Andrey Borzenkov
1 sibling, 1 reply; 9+ messages in thread
From: Brian King @ 2003-01-24 3:15 UTC (permalink / raw)
To: Andrey Borzenkov; +Cc: James Stevenson, linux-kernel
Can the interrupt handler even look at the request structure in this
scenario though? I can't say I know much about the ide subsystem but it
seems like after ide-scsi returns success on the reset function then it
should have already pushed scsi_done for the op and should not be
looking at the request structure. Would it be possible to have the
ide-scsi reset routine invoke ide_do_reset? Is there a problem with
multiple callers of ide_do_reset? I would think the right solution would
be to do this and not return success until the reset was successful and
the op was sent back to the mid-layer.
-Brian
Andrey Borzenkov wrote:
> As long as we cannot easily abort IDE request (correct me if I am wrong) the workaround seems to be to mark current request as aborted in idescsi_abort and ignore it later in idescsi_end_request, i.e. something like (with new flag PC_ABORTED)
>
> struct request *rq = HWGROUP(idescsi_drives[cmd->target])->rq;
> idescsi_pc_t *pc = (idescsi_pc_t *) rq->buffer;
> pc->flags |= PC_ABORTED;
>
> and later on assume we can ignore SCSI layer completely in this case in idescsi_end_request, just do general cleanup.
>
> If you can reliably reproduce the problem you could give it a try.
>
> Anybody sees yet another race condition here? :))
>
> -andrey
>
>
>>While burning a CD tonight I ended up taking an oops on my system. I had
>>the lkcd patch applied to my 2.4.19 kernel, so I was able to look at the
>> oops after my system rebooted. After digging into it a little and
>>looking at the ide-scsi code I think I found the problem but am not
>>sure. How can idescsi_reset simply return SCSI_RESET_SUCCESS to the scsi
>>mid layer? I think what is happening is that a command times out,
>>idescsi_abort is called, which returns SCSI_ABORT_SNOOZE. Later on
>>idescsi_reset gets called, which returns SCSI_RESET_SUCCESS. At this
>>point the scsi mid-layer owns the scsi_cmnd and returns the failure back
>>up the chain. Later on, the command gets run through
>>idescsi_end_request, which then tries to access the scsi_cmnd structure
>>which is it no longer owns.
>>
>>Any help is appreciated. I have a complete lkcd dump of the failure if
>>anyone would like more information...
>>
>>-Brian King
>>
>>
>>Here is the last bit in the log buffer:
>>
>> <4>scsi : aborting command due to timeout : pid 2534304, scsi0,
>>channel 0, id 1, lun 0 Write (10) 00 00 01 1e 91 00 00 1b 00
>> <4>hdk: timeout waiting for DMA
>> <4>ide_dmaproc: chipset supported ide_dma_timeout func only: 14
>> <4>hdk: status timeout: status=0xd8 { Busy }
>> <4>hdk: drive not ready for command
>> <4>hdk: ATAPI reset complete
>> <4>hdk: irq timeout: status=0x80 { Busy }
>> <4>hdk: ATAPI reset complete
>> <4>hdk: irq timeout: status=0x80 { Busy }
>> <1>Unable to handle kernel NULL pointer dereference at virtual
>>address 00000184
>> <4> printing eip:
>> <4>e0fd22f1
>> <1>*pde = 00000000
>> <4>Oops: 0002
>> <4>CPU: 0
>> <4>EIP: 0010:[<e0fd22f1>] Tainted: PF
>> <4>EFLAGS: 00010046
>> <4>eax: 00000000 ebx: 00000000 ecx: dfef8000 edx: c75bcbc0
>> <4>esi: 00000080 edi: c0491938 ebp: d5908000 esp: c0435ea4
>> <4>ds: 0018 es: 0018 ss: 0018
>> <4>Process swapper (pid: 0, stackpage=c0435000)
>> <4>Stack: c0491938 00000000 00000000 c0491938 00000088 000001f4
>>c03349e2 c75bcbc0
>> <4> ce0a3b80 c0491938 00000080 00000080 c75bcbc0 c0222d6c
>>00000000 c1671580
>> <4> 00000080 c04918f4 c0491938 c0434000 c1671580 e0fd2550
>>c0223b30 c0491938
>> <4>Call Trace: [<c0222d6c>] [<e0fd2550>] [<c0223b30>]
>>[<c0223990>] [<c0127af0>]
>> <4> [<c01233d4>] [<c01232a6>] [<c01230ed>] [<c010a97f>]
>>[<c010d173>] [<c0106f80>]
>> <4> [<c0106fa3>] [<c0107012>] [<c0105000>]
>> <4>
>> <4>Code: c7 80 84 01 00 00 00 00 07 00 75 72 9c 5e fa bb 00 e0 ff ff
>>
>>
>> From lkcd:
>>
>>================================================================
>>STACK TRACE FOR TASK: 0xc0434000 (swapper)
>>
>> 0 [ide-scsi]idescsi_end_request+129 [0xe0fd22f1]
>>TRACE ERROR 0x800000000
>
>
>
--
Some days it's just not worth chewing through the restraints...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re[2]: OOPS in idescsi_end_request
2003-01-24 3:15 ` OOPS in idescsi_end_request Brian King
@ 2003-01-24 6:21 ` Andrey Borzenkov
2003-01-24 9:06 ` James Stevenson
0 siblings, 1 reply; 9+ messages in thread
From: Andrey Borzenkov @ 2003-01-24 6:21 UTC (permalink / raw)
To: Brian King; +Cc: James Stevenson, linux-kernel
>
> Can the interrupt handler even look at the request structure in this
> scenario though?
Which request? IDE or SCSI? It must look at IDE request of course,
because it is request that is being served. As far as SCSI request is
concerned, the problem remains the same even if it happens later, not
in interrupt handler.
> I can't say I know much about the ide subsystem but it
> seems like after ide-scsi returns success on the reset function then it
> should have already pushed scsi_done for the op and should not be
> looking at the request structure.
Exactly. That is what I have suggested. The only problem is, there
appear to be no easy way to discard request that is currently being
executed. So my suggestion was a workaround to prevent looking at possibly already invalid request field.
The point is you cannot just discard current IDE request. The seems to be no equivalent of SCSI abort function.
> Would it be possible to have the
> ide-scsi reset routine invoke ide_do_reset?
It happens anyway later and is condirmed in kernel output. It is done
from SCSI layer, SCSI first attempts to abort currently running
command and then calls reset (on device, bus and host if needed).
> Is there a problem with
> multiple callers of ide_do_reset?
It is serialised if this is what you ask.
> I would think the right solution would
> be to do this and not return success until the reset was successful and
> the op was sent back to the mid-layer.
>
Current IDE provides no way to wait for reset success. It just sends
reset and then polls device to find out if reset was successful until
time out expires. That is the main problem. My first reaction was as
well "we must wait for reset completion". It is simply not possible
without revamping IDE code (please, correct me if I am wrong). Besides
it does not help us in any way - even when reset is complete you still
must deal with old IDE request. This is exactly what happens here.
The problem is not that we do not call reset, but that we do not
really abort IDE command. We must either do that or simulate it.
> Andrey Borzenkov wrote:
> > As long as we cannot easily abort IDE request (correct me if I am wrong) the workaround seems to be to mark current request as aborted in idescsi_abort and ignore it later in idescsi_end_request, i.e. something like (with new flag PC_ABORTED)
> >
> > struct request *rq = HWGROUP(idescsi_drives[cmd->target])->rq;
> > idescsi_pc_t *pc = (idescsi_pc_t *) rq->buffer;
> > pc->flags |= PC_ABORTED;
> >
> > and later on assume we can ignore SCSI layer completely in this case in idescsi_end_request, just do general cleanup.
Forget this, it is nonsene of course. When idescsi_abort is called,
the request may not even be services yet. So it may be needed to add
pointer to IDE request in idescsi_pc_t and then search for request
in device queue. And properly lock the whole story of course.
Would you agree to test the patch (possibly next week).
cheers
-andrey
> >
> > If you can reliably reproduce the problem you could give it a try.
> >
> > Anybody sees yet another race condition here? :))
> >
> > -andrey
> >
> >
> >>While burning a CD tonight I ended up taking an oops on my system. I had
> >>the lkcd patch applied to my 2.4.19 kernel, so I was able to look at the
> >> oops after my system rebooted. After digging into it a little and
> >>looking at the ide-scsi code I think I found the problem but am not
> >>sure. How can idescsi_reset simply return SCSI_RESET_SUCCESS to the scsi
> >>mid layer? I think what is happening is that a command times out,
> >>idescsi_abort is called, which returns SCSI_ABORT_SNOOZE. Later on
> >>idescsi_reset gets called, which returns SCSI_RESET_SUCCESS. At this
> >>point the scsi mid-layer owns the scsi_cmnd and returns the failure back
> >>up the chain. Later on, the command gets run through
> >>idescsi_end_request, which then tries to access the scsi_cmnd structure
> >>which is it no longer owns.
> >>
> >>Any help is appreciated. I have a complete lkcd dump of the failure if
> >>anyone would like more information...
> >>
> >>-Brian King
> >>
> >>
> >>Here is the last bit in the log buffer:
> >>
> >> <4>scsi : aborting command due to timeout : pid 2534304, scsi0,
> >>channel 0, id 1, lun 0 Write (10) 00 00 01 1e 91 00 00 1b 00
> >> <4>hdk: timeout waiting for DMA
> >> <4>ide_dmaproc: chipset supported ide_dma_timeout func only: 14
> >> <4>hdk: status timeout: status=0xd8 { Busy }
> >> <4>hdk: drive not ready for command
> >> <4>hdk: ATAPI reset complete
> >> <4>hdk: irq timeout: status=0x80 { Busy }
> >> <4>hdk: ATAPI reset complete
> >> <4>hdk: irq timeout: status=0x80 { Busy }
> >> <1>Unable to handle kernel NULL pointer dereference at virtual
> >>address 00000184
> >> <4> printing eip:
> >> <4>e0fd22f1
> >> <1>*pde = 00000000
> >> <4>Oops: 0002
> >> <4>CPU: 0
> >> <4>EIP: 0010:[<e0fd22f1>] Tainted: PF
> >> <4>EFLAGS: 00010046
> >> <4>eax: 00000000 ebx: 00000000 ecx: dfef8000 edx: c75bcbc0
> >> <4>esi: 00000080 edi: c0491938 ebp: d5908000 esp: c0435ea4
> >> <4>ds: 0018 es: 0018 ss: 0018
> >> <4>Process swapper (pid: 0, stackpage=c0435000)
> >> <4>Stack: c0491938 00000000 00000000 c0491938 00000088 000001f4
> >>c03349e2 c75bcbc0
> >> <4> ce0a3b80 c0491938 00000080 00000080 c75bcbc0 c0222d6c
> >>00000000 c1671580
> >> <4> 00000080 c04918f4 c0491938 c0434000 c1671580 e0fd2550
> >>c0223b30 c0491938
> >> <4>Call Trace: [<c0222d6c>] [<e0fd2550>] [<c0223b30>]
> >>[<c0223990>] [<c0127af0>]
> >> <4> [<c01233d4>] [<c01232a6>] [<c01230ed>] [<c010a97f>]
> >>[<c010d173>] [<c0106f80>]
> >> <4> [<c0106fa3>] [<c0107012>] [<c0105000>]
> >> <4>
> >> <4>Code: c7 80 84 01 00 00 00 00 07 00 75 72 9c 5e fa bb 00 e0 ff ff
> >>
> >>
> >> From lkcd:
> >>
> >>================================================================
> >>STACK TRACE FOR TASK: 0xc0434000 (swapper)
> >>
> >> 0 [ide-scsi]idescsi_end_request+129 [0xe0fd22f1]
> >>TRACE ERROR 0x800000000
> >
> >
> >
>
>
> --
> Some days it's just not worth chewing through the restraints...
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re[2]: OOPS in idescsi_end_request
2003-01-24 6:21 ` Re[2]: " Andrey Borzenkov
@ 2003-01-24 9:06 ` James Stevenson
2003-01-24 13:39 ` Brian King
0 siblings, 1 reply; 9+ messages in thread
From: James Stevenson @ 2003-01-24 9:06 UTC (permalink / raw)
To: Andrey Borzenkov, Brian King; +Cc: linux-kernel
[LARGE SNIP]
> Would you agree to test the patch (possibly next week).
yeah sure.
> cheers
>
> -andrey
>
> > >
> > > If you can reliably reproduce the problem you could give it a try.
> > >
> > > Anybody sees yet another race condition here? :))
> > >
> > > -andrey
> > >
> > >
> > >>While burning a CD tonight I ended up taking an oops on my system. I
had
> > >>the lkcd patch applied to my 2.4.19 kernel, so I was able to look at
the
> > >> oops after my system rebooted. After digging into it a little and
> > >>looking at the ide-scsi code I think I found the problem but am not
> > >>sure. How can idescsi_reset simply return SCSI_RESET_SUCCESS to the
scsi
> > >>mid layer? I think what is happening is that a command times out,
> > >>idescsi_abort is called, which returns SCSI_ABORT_SNOOZE. Later on
> > >>idescsi_reset gets called, which returns SCSI_RESET_SUCCESS. At this
> > >>point the scsi mid-layer owns the scsi_cmnd and returns the failure
back
> > >>up the chain. Later on, the command gets run through
> > >>idescsi_end_request, which then tries to access the scsi_cmnd
structure
> > >>which is it no longer owns.
> > >>
> > >>Any help is appreciated. I have a complete lkcd dump of the failure if
> > >>anyone would like more information...
> > >>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: OOPS in idescsi_end_request
2003-01-24 9:06 ` James Stevenson
@ 2003-01-24 13:39 ` Brian King
0 siblings, 0 replies; 9+ messages in thread
From: Brian King @ 2003-01-24 13:39 UTC (permalink / raw)
To: James Stevenson; +Cc: Andrey Borzenkov, linux-kernel
James Stevenson wrote:
> [LARGE SNIP]
>
>
>>Would you agree to test the patch (possibly next week).
>
>
> yeah sure.
>
I would be happy to test it as well.
-Brian
>
>
>>cheers
>>
>>-andrey
>>
>>
>>>>If you can reliably reproduce the problem you could give it a try.
>>>>
>>>>Anybody sees yet another race condition here? :))
>>>>
>>>>-andrey
>>>>
>>>>
>>>>
>>>>>While burning a CD tonight I ended up taking an oops on my system. I
>
> had
>
>>>>>the lkcd patch applied to my 2.4.19 kernel, so I was able to look at
>
> the
>
>>>>> oops after my system rebooted. After digging into it a little and
>>>>>looking at the ide-scsi code I think I found the problem but am not
>>>>>sure. How can idescsi_reset simply return SCSI_RESET_SUCCESS to the
>
> scsi
>
>>>>>mid layer? I think what is happening is that a command times out,
>>>>>idescsi_abort is called, which returns SCSI_ABORT_SNOOZE. Later on
>>>>>idescsi_reset gets called, which returns SCSI_RESET_SUCCESS. At this
>>>>>point the scsi mid-layer owns the scsi_cmnd and returns the failure
>
> back
>
>>>>>up the chain. Later on, the command gets run through
>>>>>idescsi_end_request, which then tries to access the scsi_cmnd
>
> structure
>
>>>>>which is it no longer owns.
>>>>>
>>>>>Any help is appreciated. I have a complete lkcd dump of the failure if
>>>>>anyone would like more information...
>>>>>
>
>
>
>
--
Some days it's just not worth chewing through the restraints...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: OOPS in idescsi_end_request
2003-01-22 3:41 Brian King
@ 2003-01-22 21:49 ` James Stevenson
0 siblings, 0 replies; 9+ messages in thread
From: James Stevenson @ 2003-01-22 21:49 UTC (permalink / raw)
To: Brian King; +Cc: linux-kernel
Hi
yeah i know the bug i just dont know how to fix it.
the cd drive generates an error and the item of the request Q becomes
null and the idescsi_end_quest expects it to be there.
i have been able to trigger this though the whole 2.4.x tree i mail
this list about every 2 version i just havnt been lucky enough to have
any help with it yet. I think i might have trigger it in the 2.2.x tree
as well but i didnt know very much back then either.
On Wed, 2003-01-22 at 03:41, Brian King wrote:
> While burning a CD tonight I ended up taking an oops on my system. I had
> the lkcd patch applied to my 2.4.19 kernel, so I was able to look at the
> oops after my system rebooted. After digging into it a little and
> looking at the ide-scsi code I think I found the problem but am not
> sure. How can idescsi_reset simply return SCSI_RESET_SUCCESS to the scsi
> mid layer? I think what is happening is that a command times out,
> idescsi_abort is called, which returns SCSI_ABORT_SNOOZE. Later on
> idescsi_reset gets called, which returns SCSI_RESET_SUCCESS. At this
> point the scsi mid-layer owns the scsi_cmnd and returns the failure back
> up the chain. Later on, the command gets run through
> idescsi_end_request, which then tries to access the scsi_cmnd structure
> which is it no longer owns.
>
> Any help is appreciated. I have a complete lkcd dump of the failure if
> anyone would like more information...
>
> -Brian King
>
>
> Here is the last bit in the log buffer:
>
> <4>scsi : aborting command due to timeout : pid 2534304, scsi0,
> channel 0, id 1, lun 0 Write (10) 00 00 01 1e 91 00 00 1b 00
> <4>hdk: timeout waiting for DMA
> <4>ide_dmaproc: chipset supported ide_dma_timeout func only: 14
> <4>hdk: status timeout: status=0xd8 { Busy }
> <4>hdk: drive not ready for command
> <4>hdk: ATAPI reset complete
> <4>hdk: irq timeout: status=0x80 { Busy }
> <4>hdk: ATAPI reset complete
> <4>hdk: irq timeout: status=0x80 { Busy }
> <1>Unable to handle kernel NULL pointer dereference at virtual
> address 00000184
> <4> printing eip:
> <4>e0fd22f1
> <1>*pde = 00000000
> <4>Oops: 0002
> <4>CPU: 0
> <4>EIP: 0010:[<e0fd22f1>] Tainted: PF
> <4>EFLAGS: 00010046
> <4>eax: 00000000 ebx: 00000000 ecx: dfef8000 edx: c75bcbc0
> <4>esi: 00000080 edi: c0491938 ebp: d5908000 esp: c0435ea4
> <4>ds: 0018 es: 0018 ss: 0018
> <4>Process swapper (pid: 0, stackpage=c0435000)
> <4>Stack: c0491938 00000000 00000000 c0491938 00000088 000001f4
> c03349e2 c75bcbc0
> <4> ce0a3b80 c0491938 00000080 00000080 c75bcbc0 c0222d6c
> 00000000 c1671580
> <4> 00000080 c04918f4 c0491938 c0434000 c1671580 e0fd2550
> c0223b30 c0491938
> <4>Call Trace: [<c0222d6c>] [<e0fd2550>] [<c0223b30>]
> [<c0223990>] [<c0127af0>]
> <4> [<c01233d4>] [<c01232a6>] [<c01230ed>] [<c010a97f>]
> [<c010d173>] [<c0106f80>]
> <4> [<c0106fa3>] [<c0107012>] [<c0105000>]
> <4>
> <4>Code: c7 80 84 01 00 00 00 00 07 00 75 72 9c 5e fa bb 00 e0 ff ff
>
>
> From lkcd:
>
> ================================================================
> STACK TRACE FOR TASK: 0xc0434000 (swapper)
>
> 0 [ide-scsi]idescsi_end_request+129 [0xe0fd22f1]
> TRACE ERROR 0x800000000
> ================================================================
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 9+ messages in thread
* OOPS in idescsi_end_request
@ 2003-01-22 3:41 Brian King
2003-01-22 21:49 ` James Stevenson
0 siblings, 1 reply; 9+ messages in thread
From: Brian King @ 2003-01-22 3:41 UTC (permalink / raw)
To: linux-kernel
While burning a CD tonight I ended up taking an oops on my system. I had
the lkcd patch applied to my 2.4.19 kernel, so I was able to look at the
oops after my system rebooted. After digging into it a little and
looking at the ide-scsi code I think I found the problem but am not
sure. How can idescsi_reset simply return SCSI_RESET_SUCCESS to the scsi
mid layer? I think what is happening is that a command times out,
idescsi_abort is called, which returns SCSI_ABORT_SNOOZE. Later on
idescsi_reset gets called, which returns SCSI_RESET_SUCCESS. At this
point the scsi mid-layer owns the scsi_cmnd and returns the failure back
up the chain. Later on, the command gets run through
idescsi_end_request, which then tries to access the scsi_cmnd structure
which is it no longer owns.
Any help is appreciated. I have a complete lkcd dump of the failure if
anyone would like more information...
-Brian King
Here is the last bit in the log buffer:
<4>scsi : aborting command due to timeout : pid 2534304, scsi0,
channel 0, id 1, lun 0 Write (10) 00 00 01 1e 91 00 00 1b 00
<4>hdk: timeout waiting for DMA
<4>ide_dmaproc: chipset supported ide_dma_timeout func only: 14
<4>hdk: status timeout: status=0xd8 { Busy }
<4>hdk: drive not ready for command
<4>hdk: ATAPI reset complete
<4>hdk: irq timeout: status=0x80 { Busy }
<4>hdk: ATAPI reset complete
<4>hdk: irq timeout: status=0x80 { Busy }
<1>Unable to handle kernel NULL pointer dereference at virtual
address 00000184
<4> printing eip:
<4>e0fd22f1
<1>*pde = 00000000
<4>Oops: 0002
<4>CPU: 0
<4>EIP: 0010:[<e0fd22f1>] Tainted: PF
<4>EFLAGS: 00010046
<4>eax: 00000000 ebx: 00000000 ecx: dfef8000 edx: c75bcbc0
<4>esi: 00000080 edi: c0491938 ebp: d5908000 esp: c0435ea4
<4>ds: 0018 es: 0018 ss: 0018
<4>Process swapper (pid: 0, stackpage=c0435000)
<4>Stack: c0491938 00000000 00000000 c0491938 00000088 000001f4
c03349e2 c75bcbc0
<4> ce0a3b80 c0491938 00000080 00000080 c75bcbc0 c0222d6c
00000000 c1671580
<4> 00000080 c04918f4 c0491938 c0434000 c1671580 e0fd2550
c0223b30 c0491938
<4>Call Trace: [<c0222d6c>] [<e0fd2550>] [<c0223b30>]
[<c0223990>] [<c0127af0>]
<4> [<c01233d4>] [<c01232a6>] [<c01230ed>] [<c010a97f>]
[<c010d173>] [<c0106f80>]
<4> [<c0106fa3>] [<c0107012>] [<c0105000>]
<4>
<4>Code: c7 80 84 01 00 00 00 00 07 00 75 72 9c 5e fa bb 00 e0 ff ff
From lkcd:
================================================================
STACK TRACE FOR TASK: 0xc0434000 (swapper)
0 [ide-scsi]idescsi_end_request+129 [0xe0fd22f1]
TRACE ERROR 0x800000000
================================================================
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-01-24 13:32 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-23 8:52 OOPS in idescsi_end_request Andrey Borzenkov
2003-01-23 9:19 ` Strong kernel lock with linux-2.5.59 : futex in Huge Pages dada1
2003-01-23 9:41 ` Andrew Morton
2003-01-24 3:15 ` OOPS in idescsi_end_request Brian King
2003-01-24 6:21 ` Re[2]: " Andrey Borzenkov
2003-01-24 9:06 ` James Stevenson
2003-01-24 13:39 ` Brian King
-- strict thread matches above, loose matches on Subject: below --
2003-01-22 3:41 Brian King
2003-01-22 21:49 ` James Stevenson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).