* Re: 2.5.63/64 do not boot: loop in scsi_error
@ 2003-03-06 6:39 Andries.Brouwer
2003-03-06 6:49 ` Mike Anderson
0 siblings, 1 reply; 21+ messages in thread
From: Andries.Brouwer @ 2003-03-06 6:39 UTC (permalink / raw)
To: Andries.Brouwer, torvalds; +Cc: linux-kernel, linux-scsi
> See if this fixes it..
No, I am afraid not. My infinite loop does not pass through
scsi_eh_ready_devs().
Andries
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 6:39 2.5.63/64 do not boot: loop in scsi_error Andries.Brouwer
@ 2003-03-06 6:49 ` Mike Anderson
2003-03-06 7:59 ` Zwane Mwaikambo
0 siblings, 1 reply; 21+ messages in thread
From: Mike Anderson @ 2003-03-06 6:49 UTC (permalink / raw)
To: Andries.Brouwer; +Cc: torvalds, linux-kernel, linux-scsi
Andries.Brouwer@cwi.nl [Andries.Brouwer@cwi.nl] wrote:
> > See if this fixes it..
>
> No, I am afraid not. My infinite loop does not pass through
> scsi_eh_ready_devs().
>
Can you send me your console log. If you have scsi_logging=1 that would
be greate also.
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 6:49 ` Mike Anderson
@ 2003-03-06 7:59 ` Zwane Mwaikambo
2003-03-06 8:30 ` Mike Anderson
0 siblings, 1 reply; 21+ messages in thread
From: Zwane Mwaikambo @ 2003-03-06 7:59 UTC (permalink / raw)
To: Mike Anderson; +Cc: Andries.Brouwer, torvalds, linux-kernel, linux-scsi
On Wed, 5 Mar 2003, Mike Anderson wrote:
> Andries.Brouwer@cwi.nl [Andries.Brouwer@cwi.nl] wrote:
> > > See if this fixes it..
> >
> > No, I am afraid not. My infinite loop does not pass through
> > scsi_eh_ready_devs().
> >
>
> Can you send me your console log. If you have scsi_logging=1 that would
> be greate also.
If you can figure out which paths this goes through because it completely
locks up right before printing 'scsi: device offlined' on 2.5.63. I
can't provide much more information at present.
scsi1 : QLogic ISP1020 SCSI on PCI bus 04 device 70 irq 89 MEM base 0xf8a18000
scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 0 lun 0
scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 1 lun 0
Zwane
--
function.linuxpower.ca
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 7:59 ` Zwane Mwaikambo
@ 2003-03-06 8:30 ` Mike Anderson
2003-03-06 8:35 ` Zwane Mwaikambo
2003-03-06 8:37 ` Mike Anderson
0 siblings, 2 replies; 21+ messages in thread
From: Mike Anderson @ 2003-03-06 8:30 UTC (permalink / raw)
To: Zwane Mwaikambo; +Cc: Andries.Brouwer, torvalds, linux-kernel, linux-scsi
Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> scsi1 : QLogic ISP1020 SCSI on PCI bus 04 device 70 irq 89 MEM base 0xf8a18000
> scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 0 lun 0
> scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 1 lun 0
>
Did this work in 2.5.62? The qlogicisp driver does have any error
handlers. Any error will cause a device offline state. You
should see a message at boot like:
ERROR: This is not a safe way to run your SCSI host
ERROR: The error handling must be added to this driver
This does not explain what is causing the error handler to start up or
do anything to help your problem.
We have been switching to the feral driver to handle the qlogic isp
card. This driver contains error handling routines. I believe the 2.5
versions of the driver is in the -mm tree. I also believe Andrew has it
as a separate patch.
I did try running the qlogicisp driver and it appears to be loading for
me, but I do not have any non-disk devices on the system at the moment.
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 8:30 ` Mike Anderson
@ 2003-03-06 8:35 ` Zwane Mwaikambo
2003-03-06 8:55 ` Mike Anderson
2003-03-06 8:37 ` Mike Anderson
1 sibling, 1 reply; 21+ messages in thread
From: Zwane Mwaikambo @ 2003-03-06 8:35 UTC (permalink / raw)
To: Mike Anderson; +Cc: Andries.Brouwer, torvalds, linux-kernel, linux-scsi
On Thu, 6 Mar 2003, Mike Anderson wrote:
> Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> > scsi1 : QLogic ISP1020 SCSI on PCI bus 04 device 70 irq 89 MEM base 0xf8a18000
> > scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 0 lun 0
> > scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 1 lun 0
> >
>
> Did this work in 2.5.62? The qlogicisp driver does have any error
> handlers. Any error will cause a device offline state. You
> should see a message at boot like:
> ERROR: This is not a safe way to run your SCSI host
> ERROR: The error handling must be added to this driver
That error was from a booting 2.5.62 and i do get the warnings about
missing error handling.
> This does not explain what is causing the error handler to start up or
> do anything to help your problem.
I'm not concerned about that, that was peripheral damage from another
patch (affected irq handling), the difference being is that with 2.5.62 it boots
after printing those errors a couple of times, but with 2.5.63 it doesn't.
> We have been switching to the feral driver to handle the qlogic isp
> card. This driver contains error handling routines. I believe the 2.5
> versions of the driver is in the -mm tree. I also believe Andrew has it
> as a separate patch.
>
> I did try running the qlogicisp driver and it appears to be loading for
> me, but I do not have any non-disk devices on the system at the moment.
I'm currently using it with the following devices and survives general
usage.
scsi0 : QLogic ISP1020 SCSI on PCI bus 01 device 70 irq 41 MEM base 0xf8a16000
Vendor: IBM Model: DRHS36V Rev: 0270
Type: Direct-Access ANSI SCSI revision: 03
Vendor: IBM Model: DRHS36V Rev: 0270
Type: Direct-Access ANSI SCSI revision: 03
Vendor: PLEXTOR Model: CD-ROM PX-32CS Rev: 1.02
Type: CD-ROM ANSI SCSI revision: 02
SCSI device sda: 72170879 512-byte hdwr sectors (36951 MB)
SCSI device sda: drive cache: write through
sda: sda1 sda2 sda3
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sdb: 72170879 512-byte hdwr sectors (36951 MB)
SCSI device sdb: drive cache: write through
sdb: unknown partition table
Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
sr0: scsi-1 drive
Zwane
--
function.linuxpower.ca
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 8:30 ` Mike Anderson
2003-03-06 8:35 ` Zwane Mwaikambo
@ 2003-03-06 8:37 ` Mike Anderson
1 sibling, 0 replies; 21+ messages in thread
From: Mike Anderson @ 2003-03-06 8:37 UTC (permalink / raw)
To: Zwane Mwaikambo, Andries.Brouwer, torvalds, linux-kernel, linux-scsi
Mike Anderson [andmike@us.ibm.com] wrote:
> Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> > scsi1 : QLogic ISP1020 SCSI on PCI bus 04 device 70 irq 89 MEM base 0xf8a18000
> > scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 0 lun 0
> > scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 1 lun 0
> >
>
> Did this work in 2.5.62? The qlogicisp driver does have any error
The above line should read "does not have any error"
> handlers. Any error will cause a device offline state. You
> should see a message at boot like:
> ERROR: This is not a safe way to run your SCSI host
> ERROR: The error handling must be added to this driver
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 8:35 ` Zwane Mwaikambo
@ 2003-03-06 8:55 ` Mike Anderson
2003-03-06 9:00 ` Zwane Mwaikambo
0 siblings, 1 reply; 21+ messages in thread
From: Mike Anderson @ 2003-03-06 8:55 UTC (permalink / raw)
To: Zwane Mwaikambo; +Cc: Andries.Brouwer, torvalds, linux-kernel, linux-scsi
Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> I'm not concerned about that, that was peripheral damage from another
> patch (affected irq handling), the difference being is that with 2.5.62 it boots
> after printing those errors a couple of times, but with 2.5.63 it doesn't.
Ok I will keep looking at this , I believe I have a PLEXTOR CD in the
lab I will add this to my qlogic isp bus and see if I can get the error
to show up. I am running cd drives on the other adapters and I am not
seeing a problem.
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 8:55 ` Mike Anderson
@ 2003-03-06 9:00 ` Zwane Mwaikambo
2003-03-06 9:18 ` Mike Anderson
0 siblings, 1 reply; 21+ messages in thread
From: Zwane Mwaikambo @ 2003-03-06 9:00 UTC (permalink / raw)
To: Mike Anderson; +Cc: Andries.Brouwer, torvalds, linux-kernel, linux-scsi
On Thu, 6 Mar 2003, Mike Anderson wrote:
> Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> > I'm not concerned about that, that was peripheral damage from another
> > patch (affected irq handling), the difference being is that with 2.5.62 it boots
> > after printing those errors a couple of times, but with 2.5.63 it doesn't.
>
> Ok I will keep looking at this , I believe I have a PLEXTOR CD in the
> lab I will add this to my qlogic isp bus and see if I can get the error
> to show up. I am running cd drives on the other adapters and I am not
> seeing a problem.
My apologies, i think i wasn't being too clear. You won't be able to
replicate that exact error by default, i got it because i killed
interrupt routing/handling on the interrupt controllers servicing the bus
on which the scsi controller is on. The errors generated by the SCSI layer
in turn kill the box in 2.5.63 whilst only spewing those errors and
continuing boot with 2.5.62
Zwane
--
function.linuxpower.ca
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 9:00 ` Zwane Mwaikambo
@ 2003-03-06 9:18 ` Mike Anderson
2003-03-06 9:58 ` Zwane Mwaikambo
0 siblings, 1 reply; 21+ messages in thread
From: Mike Anderson @ 2003-03-06 9:18 UTC (permalink / raw)
To: Zwane Mwaikambo; +Cc: Andries.Brouwer, torvalds, linux-kernel, linux-scsi
Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> On Thu, 6 Mar 2003, Mike Anderson wrote:
>
> > Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> > > I'm not concerned about that, that was peripheral damage from another
> > > patch (affected irq handling), the difference being is that with 2.5.62 it boots
> > > after printing those errors a couple of times, but with 2.5.63 it doesn't.
> >
> > Ok I will keep looking at this , I believe I have a PLEXTOR CD in the
> > lab I will add this to my qlogic isp bus and see if I can get the error
> > to show up. I am running cd drives on the other adapters and I am not
> > seeing a problem.
>
> My apologies, i think i wasn't being too clear. You won't be able to
> replicate that exact error by default, i got it because i killed
> interrupt routing/handling on the interrupt controllers servicing the bus
> on which the scsi controller is on. The errors generated by the SCSI layer
> in turn kill the box in 2.5.63 whilst only spewing those errors and
> continuing boot with 2.5.62
Would it be possible for you to send me a console output with
scsi_logging=1 so that I can narrow down the failure case.
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 9:18 ` Mike Anderson
@ 2003-03-06 9:58 ` Zwane Mwaikambo
2003-03-06 16:31 ` James Bottomley
0 siblings, 1 reply; 21+ messages in thread
From: Zwane Mwaikambo @ 2003-03-06 9:58 UTC (permalink / raw)
To: Mike Anderson; +Cc: Andries.Brouwer, torvalds, linux-kernel, linux-scsi
On Thu, 6 Mar 2003, Mike Anderson wrote:
> Would it be possible for you to send me a console output with
> scsi_logging=1 so that I can narrow down the failure case.
The following is from 2.5.63-mjb2
http://function.linuxpower.ca/patches/numaq/dmesg-scsi_logging
The [disconnect] point is where it locks up
Zwane
--
function.linuxpower.ca
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 9:58 ` Zwane Mwaikambo
@ 2003-03-06 16:31 ` James Bottomley
2003-03-06 17:15 ` Zwane Mwaikambo
0 siblings, 1 reply; 21+ messages in thread
From: James Bottomley @ 2003-03-06 16:31 UTC (permalink / raw)
To: Zwane Mwaikambo
Cc: Mike Anderson, Andries.Brouwer, torvalds, linux-kernel,
SCSI Mailing List
On Thu, 2003-03-06 at 03:58, Zwane Mwaikambo wrote:
> On Thu, 6 Mar 2003, Mike Anderson wrote:
>
> > Would it be possible for you to send me a console output with
> > scsi_logging=1 so that I can narrow down the failure case.
>
> The following is from 2.5.63-mjb2
>
> http://function.linuxpower.ca/patches/numaq/dmesg-scsi_logging
This log implies the error handling finished after the BDR. That looks
like the system doesn't have Mike's latest patch for the logic reversal
problem in scsi_eh_ready_devs, could you check this?
Thanks,
James
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 16:31 ` James Bottomley
@ 2003-03-06 17:15 ` Zwane Mwaikambo
2003-03-06 17:21 ` James Bottomley
2003-03-06 17:24 ` Mike Anderson
0 siblings, 2 replies; 21+ messages in thread
From: Zwane Mwaikambo @ 2003-03-06 17:15 UTC (permalink / raw)
To: James Bottomley
Cc: Mike Anderson, Andries.Brouwer, torvalds, linux-kernel,
SCSI Mailing List
On Thu, 6 Mar 2003, James Bottomley wrote:
> This log implies the error handling finished after the BDR. That looks
> like the system doesn't have Mike's latest patch for the logic reversal
> problem in scsi_eh_ready_devs, could you check this?
static void scsi_eh_ready_devs(struct Scsi_Host *shost,
struct list_head *work_q,
struct list_head *done_q)
{
if (scsi_eh_bus_device_reset(shost, work_q, done_q))
if (scsi_eh_bus_reset(shost, work_q, done_q))
if (scsi_eh_host_reset(work_q, done_q))
scsi_eh_offline_sdevs(work_q, done_q);
}
That is what i currently have, i'll try a boot with;
- if (scsi_eh_bus_reset(shost, work_q, done_q))
+ if (!scsi_eh_bus_reset(shost, work_q, done_q))
Thanks,
Zwane
--
function.linuxpower.ca
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 17:15 ` Zwane Mwaikambo
@ 2003-03-06 17:21 ` James Bottomley
2003-03-06 17:39 ` Zwane Mwaikambo
2003-03-06 17:24 ` Mike Anderson
1 sibling, 1 reply; 21+ messages in thread
From: James Bottomley @ 2003-03-06 17:21 UTC (permalink / raw)
To: Zwane Mwaikambo
Cc: Mike Anderson, Andries.Brouwer, torvalds, linux-kernel,
SCSI Mailing List
On Thu, 2003-03-06 at 11:15, Zwane Mwaikambo wrote:
> On Thu, 6 Mar 2003, James Bottomley wrote:
>
> > This log implies the error handling finished after the BDR. That looks
> > like the system doesn't have Mike's latest patch for the logic reversal
> > problem in scsi_eh_ready_devs, could you check this?
>
> static void scsi_eh_ready_devs(struct Scsi_Host *shost,
> struct list_head *work_q,
> struct list_head *done_q)
> {
> if (scsi_eh_bus_device_reset(shost, work_q, done_q))
> if (scsi_eh_bus_reset(shost, work_q, done_q))
> if (scsi_eh_host_reset(work_q, done_q))
> scsi_eh_offline_sdevs(work_q, done_q);
> }
>
> That is what i currently have, i'll try a boot with;
>
> - if (scsi_eh_bus_reset(shost, work_q, done_q))
> + if (!scsi_eh_bus_reset(shost, work_q, done_q))
>
> Thanks,
> Zwane
Actually, all three if's need nots in front:
diff -Nru a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c Thu Mar 6 11:21:22 2003
+++ b/drivers/scsi/scsi_error.c Thu Mar 6 11:21:22 2003
@@ -1490,9 +1490,9 @@
struct list_head *work_q,
struct list_head *done_q)
{
- if (scsi_eh_bus_device_reset(shost, work_q, done_q))
- if (scsi_eh_bus_reset(shost, work_q, done_q))
- if (scsi_eh_host_reset(work_q, done_q))
+ if (!scsi_eh_bus_device_reset(shost, work_q, done_q))
+ if (!scsi_eh_bus_reset(shost, work_q, done_q))
+ if (!scsi_eh_host_reset(work_q, done_q))
scsi_eh_offline_sdevs(work_q, done_q);
}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 17:15 ` Zwane Mwaikambo
2003-03-06 17:21 ` James Bottomley
@ 2003-03-06 17:24 ` Mike Anderson
1 sibling, 0 replies; 21+ messages in thread
From: Mike Anderson @ 2003-03-06 17:24 UTC (permalink / raw)
To: Zwane Mwaikambo
Cc: James Bottomley, Andries.Brouwer, torvalds, linux-kernel,
SCSI Mailing List
Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> On Thu, 6 Mar 2003, James Bottomley wrote:
>
> > This log implies the error handling finished after the BDR. That looks
> > like the system doesn't have Mike's latest patch for the logic reversal
> > problem in scsi_eh_ready_devs, could you check this?
>
> static void scsi_eh_ready_devs(struct Scsi_Host *shost,
> struct list_head *work_q,
> struct list_head *done_q)
> {
> if (scsi_eh_bus_device_reset(shost, work_q, done_q))
> if (scsi_eh_bus_reset(shost, work_q, done_q))
> if (scsi_eh_host_reset(work_q, done_q))
> scsi_eh_offline_sdevs(work_q, done_q);
> }
>
> That is what i currently have, i'll try a boot with;
>
> - if (scsi_eh_bus_reset(shost, work_q, done_q))
> + if (!scsi_eh_bus_reset(shost, work_q, done_q))
>
This should not fix your problem you should apply the whole patch as the
reversed check on scsi_eh_bus_device_reset is what you should be
hitting.
The patch below should apply to your kernel version.
-andmike
--
Michael Anderson
andmike@us.ibm.com
=====
name: 00_scsi_error_ready_devs-1.diff
version: 2003-03-05.10:39:28-0800
against: 2.5.63
scsi_error.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
=====
===== drivers/scsi/scsi_error.c 1.38 vs edited =====
--- 1.38/drivers/scsi/scsi_error.c Sat Feb 22 08:17:01 2003
+++ edited/drivers/scsi/scsi_error.c Wed Mar 5 10:14:22 2003
@@ -1490,9 +1490,9 @@
struct list_head *work_q,
struct list_head *done_q)
{
- if (scsi_eh_bus_device_reset(shost, work_q, done_q))
- if (scsi_eh_bus_reset(shost, work_q, done_q))
- if (scsi_eh_host_reset(work_q, done_q))
+ if (!scsi_eh_bus_device_reset(shost, work_q, done_q))
+ if (!scsi_eh_bus_reset(shost, work_q, done_q))
+ if (!scsi_eh_host_reset(work_q, done_q))
scsi_eh_offline_sdevs(work_q, done_q);
}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 17:21 ` James Bottomley
@ 2003-03-06 17:39 ` Zwane Mwaikambo
2003-03-06 18:14 ` Mike Anderson
0 siblings, 1 reply; 21+ messages in thread
From: Zwane Mwaikambo @ 2003-03-06 17:39 UTC (permalink / raw)
To: James Bottomley
Cc: Mike Anderson, Andries.Brouwer, torvalds, linux-kernel,
SCSI Mailing List
On Thu, 6 Mar 2003, James Bottomley wrote:
> Actually, all three if's need nots in front:
>
> diff -Nru a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> --- a/drivers/scsi/scsi_error.c Thu Mar 6 11:21:22 2003
> +++ b/drivers/scsi/scsi_error.c Thu Mar 6 11:21:22 2003
> @@ -1490,9 +1490,9 @@
> struct list_head *work_q,
> struct list_head *done_q)
> {
> - if (scsi_eh_bus_device_reset(shost, work_q, done_q))
> - if (scsi_eh_bus_reset(shost, work_q, done_q))
> - if (scsi_eh_host_reset(work_q, done_q))
> + if (!scsi_eh_bus_device_reset(shost, work_q, done_q))
> + if (!scsi_eh_bus_reset(shost, work_q, done_q))
> + if (!scsi_eh_host_reset(work_q, done_q))
> scsi_eh_offline_sdevs(work_q, done_q);
> }
Ok patched 2.5.63 is back to booting as 2.5.62, would you like any more
information?
Thanks,
Zwane
--
function.linuxpower.ca
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 17:39 ` Zwane Mwaikambo
@ 2003-03-06 18:14 ` Mike Anderson
0 siblings, 0 replies; 21+ messages in thread
From: Mike Anderson @ 2003-03-06 18:14 UTC (permalink / raw)
To: Zwane Mwaikambo
Cc: James Bottomley, Andries.Brouwer, torvalds, linux-kernel,
SCSI Mailing List
Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> On Thu, 6 Mar 2003, James Bottomley wrote:
>
> > Actually, all three if's need nots in front:
> >
> > diff -Nru a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> > --- a/drivers/scsi/scsi_error.c Thu Mar 6 11:21:22 2003
> > +++ b/drivers/scsi/scsi_error.c Thu Mar 6 11:21:22 2003
> > @@ -1490,9 +1490,9 @@
> > struct list_head *work_q,
> > struct list_head *done_q)
> > {
> > - if (scsi_eh_bus_device_reset(shost, work_q, done_q))
> > - if (scsi_eh_bus_reset(shost, work_q, done_q))
> > - if (scsi_eh_host_reset(work_q, done_q))
> > + if (!scsi_eh_bus_device_reset(shost, work_q, done_q))
> > + if (!scsi_eh_bus_reset(shost, work_q, done_q))
> > + if (!scsi_eh_host_reset(work_q, done_q))
> > scsi_eh_offline_sdevs(work_q, done_q);
> > }
>
> Ok patched 2.5.63 is back to booting as 2.5.62, would you like any more
> information?
>
I believe we have all the information we need.
Thanks for sending the previous data and trying the patch.
I still need to understand the error signature for Andries as it sounds
different then what you are seeing.
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
@ 2003-03-06 9:22 Andries.Brouwer
0 siblings, 0 replies; 21+ messages in thread
From: Andries.Brouwer @ 2003-03-06 9:22 UTC (permalink / raw)
To: Andries.Brouwer, andmike; +Cc: linux-kernel, linux-scsi, torvalds
> Can you send me your console log.
Patience. Fourteen hours from now I'll look at this some more.
Andries
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 1:01 Andries.Brouwer
2003-03-06 1:13 ` Patrick Mansfield
2003-03-06 1:22 ` Linus Torvalds
@ 2003-03-06 4:15 ` Rob Radez
2 siblings, 0 replies; 21+ messages in thread
From: Rob Radez @ 2003-03-06 4:15 UTC (permalink / raw)
To: Andries.Brouwer; +Cc: linux-kernel, linux-scsi, torvalds
On Thu, Mar 06, 2003 at 02:01:38AM +0100, Andries.Brouwer@cwi.nl wrote:
> See that 2.5.64 came out - good. Time to send the next dev_t patch.
> Unfortunately 2.5.63 and 2.5.64 do not boot.
>
> A moment ago I looked at what goes wrong, and it turns out that
> scsi_error is activated
> [always a bad sign - I have never see it do any good, and
> often see it crash the machine]
> and an infinite loop occurs, leaving the machine rather dead.
>
> (Total of 1 commands require eh work; scsi_unjam_host; requesting sense;
> scsi_eh_done: result 0) - infinite repeat.
>
> Have no time tonight to make a patch, but I suppose the author of
> the 2.5.63 scsi_error.c changes knows what she did wrong.
Even with the patch to scsi_error.c floating around, I still get the
same hang/infinite loop after the information for my scsi cd-rom is
printed on both 2.5.63 and .64.
Regards,
Rob Radez
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 1:01 Andries.Brouwer
2003-03-06 1:13 ` Patrick Mansfield
@ 2003-03-06 1:22 ` Linus Torvalds
2003-03-06 4:15 ` Rob Radez
2 siblings, 0 replies; 21+ messages in thread
From: Linus Torvalds @ 2003-03-06 1:22 UTC (permalink / raw)
To: Andries.Brouwer; +Cc: linux-kernel, linux-scsi
On Thu, 6 Mar 2003 Andries.Brouwer@cwi.nl wrote:
>
> See that 2.5.64 came out - good. Time to send the next dev_t patch.
> Unfortunately 2.5.63 and 2.5.64 do not boot.
>
> A moment ago I looked at what goes wrong, and it turns out that
> scsi_error is activated
See if this fixes it..
Linus
---
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.1088 -> 1.1089
# drivers/scsi/scsi_error.c 1.38 -> 1.39
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/03/05 andmike@us.ibm.com 1.1089
# [PATCH] Fix SCSI error handler abort case
#
# I had my list empty checks reversed if aborting and bus device reset
# failed. The condition that causes the error handler to run is still
# unknown.
# --------------------------------------------
#
diff -Nru a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c Wed Mar 5 17:21:56 2003
+++ b/drivers/scsi/scsi_error.c Wed Mar 5 17:21:56 2003
@@ -1490,9 +1490,9 @@
struct list_head *work_q,
struct list_head *done_q)
{
- if (scsi_eh_bus_device_reset(shost, work_q, done_q))
- if (scsi_eh_bus_reset(shost, work_q, done_q))
- if (scsi_eh_host_reset(work_q, done_q))
+ if (!scsi_eh_bus_device_reset(shost, work_q, done_q))
+ if (!scsi_eh_bus_reset(shost, work_q, done_q))
+ if (!scsi_eh_host_reset(work_q, done_q))
scsi_eh_offline_sdevs(work_q, done_q);
}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.63/64 do not boot: loop in scsi_error
2003-03-06 1:01 Andries.Brouwer
@ 2003-03-06 1:13 ` Patrick Mansfield
2003-03-06 1:22 ` Linus Torvalds
2003-03-06 4:15 ` Rob Radez
2 siblings, 0 replies; 21+ messages in thread
From: Patrick Mansfield @ 2003-03-06 1:13 UTC (permalink / raw)
To: Andries.Brouwer; +Cc: linux-kernel, linux-scsi, torvalds
Andries -
On Thu, Mar 06, 2003 at 02:01:38AM +0100, Andries.Brouwer@cwi.nl wrote:
> See that 2.5.64 came out - good. Time to send the next dev_t patch.
> Unfortunately 2.5.63 and 2.5.64 do not boot.
Did you try the patch to scsi_error.c Mike A. recently posted?
> [I can make 2.5.64 boot if I make sure no errors ever occur.
> That means that I must disable get_evpd_page, get_serialnumber,
> get_cachetype that my old stuff doesnt know about.
> If I do that all is well.]
That sucks - even if error handling recovers from them.
-- Patrick Mansfield
^ permalink raw reply [flat|nested] 21+ messages in thread
* 2.5.63/64 do not boot: loop in scsi_error
@ 2003-03-06 1:01 Andries.Brouwer
2003-03-06 1:13 ` Patrick Mansfield
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Andries.Brouwer @ 2003-03-06 1:01 UTC (permalink / raw)
To: linux-kernel, linux-scsi, torvalds
See that 2.5.64 came out - good. Time to send the next dev_t patch.
Unfortunately 2.5.63 and 2.5.64 do not boot.
A moment ago I looked at what goes wrong, and it turns out that
scsi_error is activated
[always a bad sign - I have never see it do any good, and
often see it crash the machine]
and an infinite loop occurs, leaving the machine rather dead.
(Total of 1 commands require eh work; scsi_unjam_host; requesting sense;
scsi_eh_done: result 0) - infinite repeat.
Have no time tonight to make a patch, but I suppose the author of
the 2.5.63 scsi_error.c changes knows what she did wrong.
Andries
[I can make 2.5.64 boot if I make sure no errors ever occur.
That means that I must disable get_evpd_page, get_serialnumber,
get_cachetype that my old stuff doesnt know about.
If I do that all is well.]
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2003-03-06 18:02 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-06 6:39 2.5.63/64 do not boot: loop in scsi_error Andries.Brouwer
2003-03-06 6:49 ` Mike Anderson
2003-03-06 7:59 ` Zwane Mwaikambo
2003-03-06 8:30 ` Mike Anderson
2003-03-06 8:35 ` Zwane Mwaikambo
2003-03-06 8:55 ` Mike Anderson
2003-03-06 9:00 ` Zwane Mwaikambo
2003-03-06 9:18 ` Mike Anderson
2003-03-06 9:58 ` Zwane Mwaikambo
2003-03-06 16:31 ` James Bottomley
2003-03-06 17:15 ` Zwane Mwaikambo
2003-03-06 17:21 ` James Bottomley
2003-03-06 17:39 ` Zwane Mwaikambo
2003-03-06 18:14 ` Mike Anderson
2003-03-06 17:24 ` Mike Anderson
2003-03-06 8:37 ` Mike Anderson
-- strict thread matches above, loose matches on Subject: below --
2003-03-06 9:22 Andries.Brouwer
2003-03-06 1:01 Andries.Brouwer
2003-03-06 1:13 ` Patrick Mansfield
2003-03-06 1:22 ` Linus Torvalds
2003-03-06 4:15 ` Rob Radez
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).