linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.5.63/64 do not boot: loop in scsi_error
@ 2003-03-06  6:39 Andries.Brouwer
  2003-03-06  6:49 ` Mike Anderson
  0 siblings, 1 reply; 21+ messages in thread
From: Andries.Brouwer @ 2003-03-06  6:39 UTC (permalink / raw)
  To: Andries.Brouwer, torvalds; +Cc: linux-kernel, linux-scsi

> See if this fixes it..

No, I am afraid not. My infinite loop does not pass through
scsi_eh_ready_devs().

Andries

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06  6:39 2.5.63/64 do not boot: loop in scsi_error Andries.Brouwer
@ 2003-03-06  6:49 ` Mike Anderson
  2003-03-06  7:59   ` Zwane Mwaikambo
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Anderson @ 2003-03-06  6:49 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: torvalds, linux-kernel, linux-scsi

Andries.Brouwer@cwi.nl [Andries.Brouwer@cwi.nl] wrote:
> > See if this fixes it..
> 
> No, I am afraid not. My infinite loop does not pass through
> scsi_eh_ready_devs().
> 

Can you send me your console log. If you have scsi_logging=1 that would
be greate also.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06  6:49 ` Mike Anderson
@ 2003-03-06  7:59   ` Zwane Mwaikambo
  2003-03-06  8:30     ` Mike Anderson
  0 siblings, 1 reply; 21+ messages in thread
From: Zwane Mwaikambo @ 2003-03-06  7:59 UTC (permalink / raw)
  To: Mike Anderson; +Cc: Andries.Brouwer, torvalds, linux-kernel, linux-scsi

On Wed, 5 Mar 2003, Mike Anderson wrote:

> Andries.Brouwer@cwi.nl [Andries.Brouwer@cwi.nl] wrote:
> > > See if this fixes it..
> > 
> > No, I am afraid not. My infinite loop does not pass through
> > scsi_eh_ready_devs().
> > 
> 
> Can you send me your console log. If you have scsi_logging=1 that would
> be greate also.

If you can figure out which paths this goes through because it completely 
locks up right before printing 'scsi: device offlined' on 2.5.63. I 
can't provide much more information at present.

scsi1 : QLogic ISP1020 SCSI on PCI bus 04 device 70 irq 89 MEM base 0xf8a18000
scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 0 lun 0
scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 1 lun 0

	Zwane
-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06  7:59   ` Zwane Mwaikambo
@ 2003-03-06  8:30     ` Mike Anderson
  2003-03-06  8:35       ` Zwane Mwaikambo
  2003-03-06  8:37       ` Mike Anderson
  0 siblings, 2 replies; 21+ messages in thread
From: Mike Anderson @ 2003-03-06  8:30 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Andries.Brouwer, torvalds, linux-kernel, linux-scsi

Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> scsi1 : QLogic ISP1020 SCSI on PCI bus 04 device 70 irq 89 MEM base 0xf8a18000
> scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 0 lun 0
> scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 1 lun 0
> 

Did this work in 2.5.62? The qlogicisp driver does have any error
handlers. Any error will cause a device offline state. You
should see a message at boot like:
ERROR: This is not a safe way to run your SCSI host            
ERROR: The error handling must be added to this driver

This does not explain what is causing the error handler to start up or
do anything to help your problem.

We have been switching to the feral driver to handle the qlogic isp
card. This driver contains error handling routines. I believe the 2.5
versions of the driver is in the -mm tree. I also believe Andrew has it
as a separate patch. 

I did try running the qlogicisp driver and it appears to be loading for
me, but I do not have any non-disk devices on the system at the moment.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06  8:30     ` Mike Anderson
@ 2003-03-06  8:35       ` Zwane Mwaikambo
  2003-03-06  8:55         ` Mike Anderson
  2003-03-06  8:37       ` Mike Anderson
  1 sibling, 1 reply; 21+ messages in thread
From: Zwane Mwaikambo @ 2003-03-06  8:35 UTC (permalink / raw)
  To: Mike Anderson; +Cc: Andries.Brouwer, torvalds, linux-kernel, linux-scsi

On Thu, 6 Mar 2003, Mike Anderson wrote:

> Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> > scsi1 : QLogic ISP1020 SCSI on PCI bus 04 device 70 irq 89 MEM base 0xf8a18000
> > scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 0 lun 0
> > scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 1 lun 0
> > 
> 
> Did this work in 2.5.62? The qlogicisp driver does have any error
> handlers. Any error will cause a device offline state. You
> should see a message at boot like:
> ERROR: This is not a safe way to run your SCSI host            
> ERROR: The error handling must be added to this driver

That error was from a booting 2.5.62 and i do get the warnings about 
missing error handling.

> This does not explain what is causing the error handler to start up or
> do anything to help your problem.

I'm not concerned about that, that was peripheral damage from another 
patch (affected irq handling), the difference being is that with 2.5.62 it boots 
after printing those errors a couple of times, but with 2.5.63 it doesn't.
 
> We have been switching to the feral driver to handle the qlogic isp
> card. This driver contains error handling routines. I believe the 2.5
> versions of the driver is in the -mm tree. I also believe Andrew has it
> as a separate patch. 
> 
> I did try running the qlogicisp driver and it appears to be loading for
> me, but I do not have any non-disk devices on the system at the moment.

I'm currently using it with the following devices and survives general 
usage.

scsi0 : QLogic ISP1020 SCSI on PCI bus 01 device 70 irq 41 MEM base 0xf8a16000
  Vendor: IBM       Model: DRHS36V           Rev: 0270
  Type:   Direct-Access                      ANSI SCSI revision: 03
  Vendor: IBM       Model: DRHS36V           Rev: 0270
  Type:   Direct-Access                      ANSI SCSI revision: 03
  Vendor: PLEXTOR   Model: CD-ROM PX-32CS    Rev: 1.02
  Type:   CD-ROM                             ANSI SCSI revision: 02
SCSI device sda: 72170879 512-byte hdwr sectors (36951 MB)
SCSI device sda: drive cache: write through
 sda: sda1 sda2 sda3
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sdb: 72170879 512-byte hdwr sectors (36951 MB)
SCSI device sdb: drive cache: write through
 sdb: unknown partition table
Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
sr0: scsi-1 drive

	Zwane
-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06  8:30     ` Mike Anderson
  2003-03-06  8:35       ` Zwane Mwaikambo
@ 2003-03-06  8:37       ` Mike Anderson
  1 sibling, 0 replies; 21+ messages in thread
From: Mike Anderson @ 2003-03-06  8:37 UTC (permalink / raw)
  To: Zwane Mwaikambo, Andries.Brouwer, torvalds, linux-kernel, linux-scsi

Mike Anderson [andmike@us.ibm.com] wrote:
> Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> > scsi1 : QLogic ISP1020 SCSI on PCI bus 04 device 70 irq 89 MEM base 0xf8a18000
> > scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 0 lun 0
> > scsi: Device offlined - not ready or command retry failed after error recovery: host 1 channel 0 id 1 lun 0
> > 
> 
> Did this work in 2.5.62? The qlogicisp driver does have any error
The above line should read "does not have any error"
> handlers. Any error will cause a device offline state. You
> should see a message at boot like:
> ERROR: This is not a safe way to run your SCSI host            
> ERROR: The error handling must be added to this driver

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06  8:35       ` Zwane Mwaikambo
@ 2003-03-06  8:55         ` Mike Anderson
  2003-03-06  9:00           ` Zwane Mwaikambo
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Anderson @ 2003-03-06  8:55 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Andries.Brouwer, torvalds, linux-kernel, linux-scsi

Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> I'm not concerned about that, that was peripheral damage from another 
> patch (affected irq handling), the difference being is that with 2.5.62 it boots 
> after printing those errors a couple of times, but with 2.5.63 it doesn't.

Ok I will keep looking at this , I believe I have a PLEXTOR CD in the
lab I will add this to my qlogic isp bus and see if I can get the error
to show up. I am running cd drives on the other adapters and I am not
seeing a problem. 

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06  8:55         ` Mike Anderson
@ 2003-03-06  9:00           ` Zwane Mwaikambo
  2003-03-06  9:18             ` Mike Anderson
  0 siblings, 1 reply; 21+ messages in thread
From: Zwane Mwaikambo @ 2003-03-06  9:00 UTC (permalink / raw)
  To: Mike Anderson; +Cc: Andries.Brouwer, torvalds, linux-kernel, linux-scsi

On Thu, 6 Mar 2003, Mike Anderson wrote:

> Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> > I'm not concerned about that, that was peripheral damage from another 
> > patch (affected irq handling), the difference being is that with 2.5.62 it boots 
> > after printing those errors a couple of times, but with 2.5.63 it doesn't.
> 
> Ok I will keep looking at this , I believe I have a PLEXTOR CD in the
> lab I will add this to my qlogic isp bus and see if I can get the error
> to show up. I am running cd drives on the other adapters and I am not
> seeing a problem. 

My apologies, i think i wasn't being too clear. You won't be able to 
replicate that exact error by default, i got it because i killed 
interrupt routing/handling on the interrupt controllers servicing the bus 
on which the scsi controller is on. The errors generated by the SCSI layer 
in turn kill the box in 2.5.63 whilst only spewing those errors and 
continuing boot with 2.5.62

	Zwane
-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06  9:00           ` Zwane Mwaikambo
@ 2003-03-06  9:18             ` Mike Anderson
  2003-03-06  9:58               ` Zwane Mwaikambo
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Anderson @ 2003-03-06  9:18 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Andries.Brouwer, torvalds, linux-kernel, linux-scsi

Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> On Thu, 6 Mar 2003, Mike Anderson wrote:
> 
> > Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> > > I'm not concerned about that, that was peripheral damage from another 
> > > patch (affected irq handling), the difference being is that with 2.5.62 it boots 
> > > after printing those errors a couple of times, but with 2.5.63 it doesn't.
> > 
> > Ok I will keep looking at this , I believe I have a PLEXTOR CD in the
> > lab I will add this to my qlogic isp bus and see if I can get the error
> > to show up. I am running cd drives on the other adapters and I am not
> > seeing a problem. 
> 
> My apologies, i think i wasn't being too clear. You won't be able to 
> replicate that exact error by default, i got it because i killed 
> interrupt routing/handling on the interrupt controllers servicing the bus 
> on which the scsi controller is on. The errors generated by the SCSI layer 
> in turn kill the box in 2.5.63 whilst only spewing those errors and 
> continuing boot with 2.5.62

Would it be possible for you to send me a console output with
scsi_logging=1 so that I can narrow down the failure case.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06  9:18             ` Mike Anderson
@ 2003-03-06  9:58               ` Zwane Mwaikambo
  2003-03-06 16:31                 ` James Bottomley
  0 siblings, 1 reply; 21+ messages in thread
From: Zwane Mwaikambo @ 2003-03-06  9:58 UTC (permalink / raw)
  To: Mike Anderson; +Cc: Andries.Brouwer, torvalds, linux-kernel, linux-scsi

On Thu, 6 Mar 2003, Mike Anderson wrote:

> Would it be possible for you to send me a console output with
> scsi_logging=1 so that I can narrow down the failure case.

The following is from 2.5.63-mjb2

http://function.linuxpower.ca/patches/numaq/dmesg-scsi_logging

The [disconnect] point is where it locks up

	Zwane
-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06  9:58               ` Zwane Mwaikambo
@ 2003-03-06 16:31                 ` James Bottomley
  2003-03-06 17:15                   ` Zwane Mwaikambo
  0 siblings, 1 reply; 21+ messages in thread
From: James Bottomley @ 2003-03-06 16:31 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Mike Anderson, Andries.Brouwer, torvalds, linux-kernel,
	SCSI Mailing List

On Thu, 2003-03-06 at 03:58, Zwane Mwaikambo wrote:
> On Thu, 6 Mar 2003, Mike Anderson wrote:
> 
> > Would it be possible for you to send me a console output with
> > scsi_logging=1 so that I can narrow down the failure case.
> 
> The following is from 2.5.63-mjb2
> 
> http://function.linuxpower.ca/patches/numaq/dmesg-scsi_logging

This log implies the error handling finished after the BDR.  That looks
like the system doesn't have Mike's latest patch for the logic reversal
problem in scsi_eh_ready_devs, could you check this?

Thanks,

James



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06 16:31                 ` James Bottomley
@ 2003-03-06 17:15                   ` Zwane Mwaikambo
  2003-03-06 17:21                     ` James Bottomley
  2003-03-06 17:24                     ` Mike Anderson
  0 siblings, 2 replies; 21+ messages in thread
From: Zwane Mwaikambo @ 2003-03-06 17:15 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Anderson, Andries.Brouwer, torvalds, linux-kernel,
	SCSI Mailing List

On Thu, 6 Mar 2003, James Bottomley wrote:

> This log implies the error handling finished after the BDR.  That looks
> like the system doesn't have Mike's latest patch for the logic reversal
> problem in scsi_eh_ready_devs, could you check this?

static void scsi_eh_ready_devs(struct Scsi_Host *shost,
                              struct list_head *work_q,
                              struct list_head *done_q)
{
       if (scsi_eh_bus_device_reset(shost, work_q, done_q))
               if (scsi_eh_bus_reset(shost, work_q, done_q))
                       if (scsi_eh_host_reset(work_q, done_q))
                               scsi_eh_offline_sdevs(work_q, done_q);
}

That is what i currently have, i'll try a boot with;

-               if (scsi_eh_bus_reset(shost, work_q, done_q))
+               if (!scsi_eh_bus_reset(shost, work_q, done_q))

Thanks,
	Zwane
-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06 17:15                   ` Zwane Mwaikambo
@ 2003-03-06 17:21                     ` James Bottomley
  2003-03-06 17:39                       ` Zwane Mwaikambo
  2003-03-06 17:24                     ` Mike Anderson
  1 sibling, 1 reply; 21+ messages in thread
From: James Bottomley @ 2003-03-06 17:21 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Mike Anderson, Andries.Brouwer, torvalds, linux-kernel,
	SCSI Mailing List

On Thu, 2003-03-06 at 11:15, Zwane Mwaikambo wrote:
> On Thu, 6 Mar 2003, James Bottomley wrote:
> 
> > This log implies the error handling finished after the BDR.  That looks
> > like the system doesn't have Mike's latest patch for the logic reversal
> > problem in scsi_eh_ready_devs, could you check this?
> 
> static void scsi_eh_ready_devs(struct Scsi_Host *shost,
>                               struct list_head *work_q,
>                               struct list_head *done_q)
> {
>        if (scsi_eh_bus_device_reset(shost, work_q, done_q))
>                if (scsi_eh_bus_reset(shost, work_q, done_q))
>                        if (scsi_eh_host_reset(work_q, done_q))
>                                scsi_eh_offline_sdevs(work_q, done_q);
> }
> 
> That is what i currently have, i'll try a boot with;
> 
> -               if (scsi_eh_bus_reset(shost, work_q, done_q))
> +               if (!scsi_eh_bus_reset(shost, work_q, done_q))
> 
> Thanks,
> 	Zwane


Actually, all three if's need nots in front:

diff -Nru a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c	Thu Mar  6 11:21:22 2003
+++ b/drivers/scsi/scsi_error.c	Thu Mar  6 11:21:22 2003
@@ -1490,9 +1490,9 @@
 			       struct list_head *work_q,
 			       struct list_head *done_q)
 {
-	if (scsi_eh_bus_device_reset(shost, work_q, done_q))
-		if (scsi_eh_bus_reset(shost, work_q, done_q))
-			if (scsi_eh_host_reset(work_q, done_q))
+	if (!scsi_eh_bus_device_reset(shost, work_q, done_q))
+		if (!scsi_eh_bus_reset(shost, work_q, done_q))
+			if (!scsi_eh_host_reset(work_q, done_q))
 				scsi_eh_offline_sdevs(work_q, done_q);
 }
 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06 17:15                   ` Zwane Mwaikambo
  2003-03-06 17:21                     ` James Bottomley
@ 2003-03-06 17:24                     ` Mike Anderson
  1 sibling, 0 replies; 21+ messages in thread
From: Mike Anderson @ 2003-03-06 17:24 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: James Bottomley, Andries.Brouwer, torvalds, linux-kernel,
	SCSI Mailing List

Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> On Thu, 6 Mar 2003, James Bottomley wrote:
> 
> > This log implies the error handling finished after the BDR.  That looks
> > like the system doesn't have Mike's latest patch for the logic reversal
> > problem in scsi_eh_ready_devs, could you check this?
> 
> static void scsi_eh_ready_devs(struct Scsi_Host *shost,
>                               struct list_head *work_q,
>                               struct list_head *done_q)
> {
>        if (scsi_eh_bus_device_reset(shost, work_q, done_q))
>                if (scsi_eh_bus_reset(shost, work_q, done_q))
>                        if (scsi_eh_host_reset(work_q, done_q))
>                                scsi_eh_offline_sdevs(work_q, done_q);
> }
> 
> That is what i currently have, i'll try a boot with;
> 
> -               if (scsi_eh_bus_reset(shost, work_q, done_q))
> +               if (!scsi_eh_bus_reset(shost, work_q, done_q))
> 

This should not fix your problem you should apply the whole patch as the
reversed check on scsi_eh_bus_device_reset is what you should be
hitting.

The patch below should apply to your kernel version.

-andmike
--
Michael Anderson
andmike@us.ibm.com


=====
name:		00_scsi_error_ready_devs-1.diff
version:	2003-03-05.10:39:28-0800
against:	2.5.63

 scsi_error.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

=====
===== drivers/scsi/scsi_error.c 1.38 vs edited =====
--- 1.38/drivers/scsi/scsi_error.c	Sat Feb 22 08:17:01 2003
+++ edited/drivers/scsi/scsi_error.c	Wed Mar  5 10:14:22 2003
@@ -1490,9 +1490,9 @@
 			       struct list_head *work_q,
 			       struct list_head *done_q)
 {
-	if (scsi_eh_bus_device_reset(shost, work_q, done_q))
-		if (scsi_eh_bus_reset(shost, work_q, done_q))
-			if (scsi_eh_host_reset(work_q, done_q))
+	if (!scsi_eh_bus_device_reset(shost, work_q, done_q))
+		if (!scsi_eh_bus_reset(shost, work_q, done_q))
+			if (!scsi_eh_host_reset(work_q, done_q))
 				scsi_eh_offline_sdevs(work_q, done_q);
 }
 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06 17:21                     ` James Bottomley
@ 2003-03-06 17:39                       ` Zwane Mwaikambo
  2003-03-06 18:14                         ` Mike Anderson
  0 siblings, 1 reply; 21+ messages in thread
From: Zwane Mwaikambo @ 2003-03-06 17:39 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Anderson, Andries.Brouwer, torvalds, linux-kernel,
	SCSI Mailing List

On Thu, 6 Mar 2003, James Bottomley wrote:

> Actually, all three if's need nots in front:
> 
> diff -Nru a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> --- a/drivers/scsi/scsi_error.c	Thu Mar  6 11:21:22 2003
> +++ b/drivers/scsi/scsi_error.c	Thu Mar  6 11:21:22 2003
> @@ -1490,9 +1490,9 @@
>  			       struct list_head *work_q,
>  			       struct list_head *done_q)
>  {
> -	if (scsi_eh_bus_device_reset(shost, work_q, done_q))
> -		if (scsi_eh_bus_reset(shost, work_q, done_q))
> -			if (scsi_eh_host_reset(work_q, done_q))
> +	if (!scsi_eh_bus_device_reset(shost, work_q, done_q))
> +		if (!scsi_eh_bus_reset(shost, work_q, done_q))
> +			if (!scsi_eh_host_reset(work_q, done_q))
>  				scsi_eh_offline_sdevs(work_q, done_q);
>  }

Ok patched 2.5.63 is back to booting as 2.5.62, would you like any more 
information?

Thanks,
	Zwane
-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06 17:39                       ` Zwane Mwaikambo
@ 2003-03-06 18:14                         ` Mike Anderson
  0 siblings, 0 replies; 21+ messages in thread
From: Mike Anderson @ 2003-03-06 18:14 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: James Bottomley, Andries.Brouwer, torvalds, linux-kernel,
	SCSI Mailing List

Zwane Mwaikambo [zwane@linuxpower.ca] wrote:
> On Thu, 6 Mar 2003, James Bottomley wrote:
> 
> > Actually, all three if's need nots in front:
> > 
> > diff -Nru a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> > --- a/drivers/scsi/scsi_error.c	Thu Mar  6 11:21:22 2003
> > +++ b/drivers/scsi/scsi_error.c	Thu Mar  6 11:21:22 2003
> > @@ -1490,9 +1490,9 @@
> >  			       struct list_head *work_q,
> >  			       struct list_head *done_q)
> >  {
> > -	if (scsi_eh_bus_device_reset(shost, work_q, done_q))
> > -		if (scsi_eh_bus_reset(shost, work_q, done_q))
> > -			if (scsi_eh_host_reset(work_q, done_q))
> > +	if (!scsi_eh_bus_device_reset(shost, work_q, done_q))
> > +		if (!scsi_eh_bus_reset(shost, work_q, done_q))
> > +			if (!scsi_eh_host_reset(work_q, done_q))
> >  				scsi_eh_offline_sdevs(work_q, done_q);
> >  }
> 
> Ok patched 2.5.63 is back to booting as 2.5.62, would you like any more 
> information?
> 

I believe we have all the information we need.

Thanks for sending the previous data and trying the patch.

I still need to understand the error signature for Andries as it sounds
different then what you are seeing.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
@ 2003-03-06  9:22 Andries.Brouwer
  0 siblings, 0 replies; 21+ messages in thread
From: Andries.Brouwer @ 2003-03-06  9:22 UTC (permalink / raw)
  To: Andries.Brouwer, andmike; +Cc: linux-kernel, linux-scsi, torvalds

> Can you send me your console log.

Patience. Fourteen hours from now I'll look at this some more.

Andries

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06  1:01 Andries.Brouwer
  2003-03-06  1:13 ` Patrick Mansfield
  2003-03-06  1:22 ` Linus Torvalds
@ 2003-03-06  4:15 ` Rob Radez
  2 siblings, 0 replies; 21+ messages in thread
From: Rob Radez @ 2003-03-06  4:15 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: linux-kernel, linux-scsi, torvalds

On Thu, Mar 06, 2003 at 02:01:38AM +0100, Andries.Brouwer@cwi.nl wrote:
> See that 2.5.64 came out - good. Time to send the next dev_t patch.
> Unfortunately 2.5.63 and 2.5.64 do not boot.
> 
> A moment ago I looked at what goes wrong, and it turns out that
> scsi_error is activated
>   [always a bad sign - I have never see it do any good, and
>    often see it crash the machine]
> and an infinite loop occurs, leaving the machine rather dead.
> 
> (Total of 1 commands require eh work; scsi_unjam_host; requesting sense;
>  scsi_eh_done: result 0) - infinite repeat.
> 
> Have no time tonight to make a patch, but I suppose the author of
> the 2.5.63 scsi_error.c changes knows what she did wrong.

Even with the patch to scsi_error.c floating around, I still get the
same hang/infinite loop after the information for my scsi cd-rom is
printed on both 2.5.63 and .64.

Regards,
Rob Radez

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06  1:01 Andries.Brouwer
  2003-03-06  1:13 ` Patrick Mansfield
@ 2003-03-06  1:22 ` Linus Torvalds
  2003-03-06  4:15 ` Rob Radez
  2 siblings, 0 replies; 21+ messages in thread
From: Linus Torvalds @ 2003-03-06  1:22 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: linux-kernel, linux-scsi


On Thu, 6 Mar 2003 Andries.Brouwer@cwi.nl wrote:
>
> See that 2.5.64 came out - good. Time to send the next dev_t patch.
> Unfortunately 2.5.63 and 2.5.64 do not boot.
> 
> A moment ago I looked at what goes wrong, and it turns out that
> scsi_error is activated

See if this fixes it..

		Linus

---
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.1088  -> 1.1089 
#	drivers/scsi/scsi_error.c	1.38    -> 1.39   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/03/05	andmike@us.ibm.com	1.1089
# [PATCH] Fix SCSI error handler abort case
# 
# I had my list empty checks reversed if aborting and bus device reset
# failed.  The condition that causes the error handler to run is still
# unknown.
# --------------------------------------------
#
diff -Nru a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c	Wed Mar  5 17:21:56 2003
+++ b/drivers/scsi/scsi_error.c	Wed Mar  5 17:21:56 2003
@@ -1490,9 +1490,9 @@
 			       struct list_head *work_q,
 			       struct list_head *done_q)
 {
-	if (scsi_eh_bus_device_reset(shost, work_q, done_q))
-		if (scsi_eh_bus_reset(shost, work_q, done_q))
-			if (scsi_eh_host_reset(work_q, done_q))
+	if (!scsi_eh_bus_device_reset(shost, work_q, done_q))
+		if (!scsi_eh_bus_reset(shost, work_q, done_q))
+			if (!scsi_eh_host_reset(work_q, done_q))
 				scsi_eh_offline_sdevs(work_q, done_q);
 }
 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.63/64 do not boot: loop in scsi_error
  2003-03-06  1:01 Andries.Brouwer
@ 2003-03-06  1:13 ` Patrick Mansfield
  2003-03-06  1:22 ` Linus Torvalds
  2003-03-06  4:15 ` Rob Radez
  2 siblings, 0 replies; 21+ messages in thread
From: Patrick Mansfield @ 2003-03-06  1:13 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: linux-kernel, linux-scsi, torvalds

Andries -

On Thu, Mar 06, 2003 at 02:01:38AM +0100, Andries.Brouwer@cwi.nl wrote:
> See that 2.5.64 came out - good. Time to send the next dev_t patch.
> Unfortunately 2.5.63 and 2.5.64 do not boot.

Did you try the patch to scsi_error.c Mike A. recently posted?

> [I can make 2.5.64 boot if I make sure no errors ever occur.
> That means that I must disable get_evpd_page, get_serialnumber,
> get_cachetype that my old stuff doesnt know about.
> If I do that all is well.]

That sucks - even if error handling recovers from them.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 21+ messages in thread

* 2.5.63/64 do not boot: loop in scsi_error
@ 2003-03-06  1:01 Andries.Brouwer
  2003-03-06  1:13 ` Patrick Mansfield
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Andries.Brouwer @ 2003-03-06  1:01 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, torvalds

See that 2.5.64 came out - good. Time to send the next dev_t patch.
Unfortunately 2.5.63 and 2.5.64 do not boot.

A moment ago I looked at what goes wrong, and it turns out that
scsi_error is activated
  [always a bad sign - I have never see it do any good, and
   often see it crash the machine]
and an infinite loop occurs, leaving the machine rather dead.

(Total of 1 commands require eh work; scsi_unjam_host; requesting sense;
 scsi_eh_done: result 0) - infinite repeat.

Have no time tonight to make a patch, but I suppose the author of
the 2.5.63 scsi_error.c changes knows what she did wrong.

Andries


[I can make 2.5.64 boot if I make sure no errors ever occur.
That means that I must disable get_evpd_page, get_serialnumber,
get_cachetype that my old stuff doesnt know about.
If I do that all is well.]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2003-03-06 18:02 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-06  6:39 2.5.63/64 do not boot: loop in scsi_error Andries.Brouwer
2003-03-06  6:49 ` Mike Anderson
2003-03-06  7:59   ` Zwane Mwaikambo
2003-03-06  8:30     ` Mike Anderson
2003-03-06  8:35       ` Zwane Mwaikambo
2003-03-06  8:55         ` Mike Anderson
2003-03-06  9:00           ` Zwane Mwaikambo
2003-03-06  9:18             ` Mike Anderson
2003-03-06  9:58               ` Zwane Mwaikambo
2003-03-06 16:31                 ` James Bottomley
2003-03-06 17:15                   ` Zwane Mwaikambo
2003-03-06 17:21                     ` James Bottomley
2003-03-06 17:39                       ` Zwane Mwaikambo
2003-03-06 18:14                         ` Mike Anderson
2003-03-06 17:24                     ` Mike Anderson
2003-03-06  8:37       ` Mike Anderson
  -- strict thread matches above, loose matches on Subject: below --
2003-03-06  9:22 Andries.Brouwer
2003-03-06  1:01 Andries.Brouwer
2003-03-06  1:13 ` Patrick Mansfield
2003-03-06  1:22 ` Linus Torvalds
2003-03-06  4:15 ` Rob Radez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).