linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cciss: force ignore of responses to unsent scsi commands after kexec reboot
@ 2007-06-14 15:31 Neil Horman
  2007-06-14 15:59 ` Randy Dunlap
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Neil Horman @ 2007-06-14 15:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: mike.miller, iss_storagedev, akpm, nhorman

Hey -
	cciss hardware currently can continue to send responses to scsi commands
after the host system has undergone a kexec reboot.  The way the drier is
currently written, reception of these commands results in a BUG halt, since it
can't match the response to any issued command since the boot.  This patch
corrects that by using the kexec reset_devices command line paramter to force
ignore any commands that it cant correlate.

Regards
Neil

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 cciss.c |    8 ++++++++
 1 file changed, 8 insertions(+)


diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index 5acc6c4..ec1c1d2 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -2131,6 +2131,14 @@ static int add_sendcmd_reject(__u8 cmd, int ctlr, unsigned long complete)
 		       ctlr, complete);
 		/* not much we can do. */
 #ifdef CONFIG_CISS_SCSI_TAPE
+		/* We might get notification of completion of commands
+		 * which we never issued in this kernel if this boot is
+		 * taking place after previous kernel's crash. Simply
+		 * ignore the commands in this case.
+		 */
+		if (reset_devices)
+			return 0;
+
 		return 1;
 	}
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] cciss: force ignore of responses to unsent scsi commands after kexec reboot
  2007-06-14 15:31 [PATCH] cciss: force ignore of responses to unsent scsi commands after kexec reboot Neil Horman
@ 2007-06-14 15:59 ` Randy Dunlap
  2007-06-14 16:44   ` Neil Horman
  2007-06-14 18:16 ` Miller, Mike (OS Dev)
  2007-07-03 15:59 ` Miller, Mike (OS Dev)
  2 siblings, 1 reply; 8+ messages in thread
From: Randy Dunlap @ 2007-06-14 15:59 UTC (permalink / raw)
  To: Neil Horman; +Cc: linux-kernel, mike.miller, iss_storagedev, akpm

On Thu, 14 Jun 2007 11:31:19 -0400 Neil Horman wrote:

> Hey -
> 	cciss hardware currently can continue to send responses to scsi commands
> after the host system has undergone a kexec reboot.  The way the drier is
> currently written, reception of these commands results in a BUG halt, since it
> can't match the response to any issued command since the boot.  This patch
> corrects that by using the kexec reset_devices command line paramter to force
> ignore any commands that it cant correlate.
> 
> Regards
> Neil
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 
> 
>  cciss.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> 
> diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
> index 5acc6c4..ec1c1d2 100644
> --- a/drivers/block/cciss.c
> +++ b/drivers/block/cciss.c
> @@ -2131,6 +2131,14 @@ static int add_sendcmd_reject(__u8 cmd, int ctlr, unsigned long complete)
>  		       ctlr, complete);
>  		/* not much we can do. */
>  #ifdef CONFIG_CISS_SCSI_TAPE
> +		/* We might get notification of completion of commands
> +		 * which we never issued in this kernel if this boot is
> +		 * taking place after previous kernel's crash. Simply
> +		 * ignore the commands in this case.
> +		 */
> +		if (reset_devices)
> +			return 0;
> +
>  		return 1;
>  	}

But this patch applies only to SCSI tape devices, not to
disk devices?

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] cciss: force ignore of responses to unsent scsi commands after kexec reboot
  2007-06-14 15:59 ` Randy Dunlap
@ 2007-06-14 16:44   ` Neil Horman
  0 siblings, 0 replies; 8+ messages in thread
From: Neil Horman @ 2007-06-14 16:44 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-kernel, mike.miller, iss_storagedev, akpm

On Thu, Jun 14, 2007 at 08:59:24AM -0700, Randy Dunlap wrote:
> On Thu, 14 Jun 2007 11:31:19 -0400 Neil Horman wrote:
> 
> > Hey -
> > 	cciss hardware currently can continue to send responses to scsi commands
> > after the host system has undergone a kexec reboot.  The way the drier is
> > currently written, reception of these commands results in a BUG halt, since it
> > can't match the response to any issued command since the boot.  This patch
> > corrects that by using the kexec reset_devices command line paramter to force
> > ignore any commands that it cant correlate.
> > 
> > Regards
> > Neil
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > 
> > 
> >  cciss.c |    8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > 
> > diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
> > index 5acc6c4..ec1c1d2 100644
> > --- a/drivers/block/cciss.c
> > +++ b/drivers/block/cciss.c
> > @@ -2131,6 +2131,14 @@ static int add_sendcmd_reject(__u8 cmd, int ctlr, unsigned long complete)
> >  		       ctlr, complete);
> >  		/* not much we can do. */
> >  #ifdef CONFIG_CISS_SCSI_TAPE
> > +		/* We might get notification of completion of commands
> > +		 * which we never issued in this kernel if this boot is
> > +		 * taking place after previous kernel's crash. Simply
> > +		 * ignore the commands in this case.
> > +		 */
> > +		if (reset_devices)
> > +			return 0;
> > +
> >  		return 1;
> >  	}
> 
> But this patch applies only to SCSI tape devices, not to
> disk devices?
> 
No it only applies if CONFIG_CISS_SCSI_TAPE is defined (which is different than
only being applicable to scsi tape devices.  I'm not sure why the cciss driver
ifdefs the relevant code to only support TAPE devices, but I figured that wasn't
as relevant to this problem.  Sufice it to say, that all scsi commands to all
cciss devices pass through this code if CONFIG_CISS_SCSI_TAPE is defined, and if
its not defined, add_sendcmd_reject always returns 0

Thanks & Regards
Neil

> ---
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code ***

-- 
/***************************************************
 *Neil Horman
 *Software Engineer
 *Red Hat, Inc.
 *nhorman@tuxdriver.com
 *gpg keyid: 1024D / 0x92A74FA1
 *http://pgp.mit.edu
 ***************************************************/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] cciss: force ignore of responses to unsent scsi commands after kexec reboot
  2007-06-14 15:31 [PATCH] cciss: force ignore of responses to unsent scsi commands after kexec reboot Neil Horman
  2007-06-14 15:59 ` Randy Dunlap
@ 2007-06-14 18:16 ` Miller, Mike (OS Dev)
  2007-06-14 19:25   ` Neil Horman
  2007-07-03 15:59 ` Miller, Mike (OS Dev)
  2 siblings, 1 reply; 8+ messages in thread
From: Miller, Mike (OS Dev) @ 2007-06-14 18:16 UTC (permalink / raw)
  To: Neil Horman, linux-kernel; +Cc: ISS StorageDev, akpm

 

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com] 
> Sent: Thursday, June 14, 2007 10:31 AM
> To: linux-kernel@vger.kernel.org
> Cc: Miller, Mike (OS Dev); ISS StorageDev; 
> akpm@linux-foundation.org; nhorman@tuxdriver.com
> Subject: [PATCH] cciss: force ignore of responses to unsent 
> scsi commands after kexec reboot
> 
> Hey -
> 	cciss hardware currently can continue to send responses 
> to scsi commands after the host system has undergone a kexec 
> reboot.  The way the drier is currently written, reception of 
> these commands results in a BUG halt, since it can't match 
> the response to any issued command since the boot.  This 
> patch corrects that by using the kexec reset_devices command 
> line paramter to force ignore any commands that it cant correlate.
> 
> Regards
> Neil
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 
> 
>  cciss.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> 
> diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c 
> index 5acc6c4..ec1c1d2 100644
> --- a/drivers/block/cciss.c
> +++ b/drivers/block/cciss.c
> @@ -2131,6 +2131,14 @@ static int add_sendcmd_reject(__u8 
> cmd, int ctlr, unsigned long complete)
>  		       ctlr, complete);
>  		/* not much we can do. */
>  #ifdef CONFIG_CISS_SCSI_TAPE
> +		/* We might get notification of completion of commands
> +		 * which we never issued in this kernel if this boot is
> +		 * taking place after previous kernel's crash. Simply
> +		 * ignore the commands in this case.
> +		 */
> +		if (reset_devices)
> +			return 0;
> +
>  		return 1;
>  	}
>  
I don't understand how this will help. We need to reset the controller
which reset_devices cannot do alone. I just haven't have the time to
implement the fix yet.

mikem 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] cciss: force ignore of responses to unsent scsi commands after kexec reboot
  2007-06-14 18:16 ` Miller, Mike (OS Dev)
@ 2007-06-14 19:25   ` Neil Horman
  2007-06-18  4:30     ` Vivek Goyal
  0 siblings, 1 reply; 8+ messages in thread
From: Neil Horman @ 2007-06-14 19:25 UTC (permalink / raw)
  To: Miller, Mike (OS Dev); +Cc: linux-kernel, ISS StorageDev, akpm

On Thu, Jun 14, 2007 at 06:16:03PM -0000, Miller, Mike (OS Dev) wrote:
>  
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com] 
> > Sent: Thursday, June 14, 2007 10:31 AM
> > To: linux-kernel@vger.kernel.org
> > Cc: Miller, Mike (OS Dev); ISS StorageDev; 
> > akpm@linux-foundation.org; nhorman@tuxdriver.com
> > Subject: [PATCH] cciss: force ignore of responses to unsent 
> > scsi commands after kexec reboot
> > 
> > Hey -
> > 	cciss hardware currently can continue to send responses 
> > to scsi commands after the host system has undergone a kexec 
> > reboot.  The way the drier is currently written, reception of 
> > these commands results in a BUG halt, since it can't match 
> > the response to any issued command since the boot.  This 
> > patch corrects that by using the kexec reset_devices command 
> > line paramter to force ignore any commands that it cant correlate.
> > 
> > Regards
> > Neil
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > 
> > 
> >  cciss.c |    8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > 
> > diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c 
> > index 5acc6c4..ec1c1d2 100644
> > --- a/drivers/block/cciss.c
> > +++ b/drivers/block/cciss.c
> > @@ -2131,6 +2131,14 @@ static int add_sendcmd_reject(__u8 
> > cmd, int ctlr, unsigned long complete)
> >  		       ctlr, complete);
> >  		/* not much we can do. */
> >  #ifdef CONFIG_CISS_SCSI_TAPE
> > +		/* We might get notification of completion of commands
> > +		 * which we never issued in this kernel if this boot is
> > +		 * taking place after previous kernel's crash. Simply
> > +		 * ignore the commands in this case.
> > +		 */
> > +		if (reset_devices)
> > +			return 0;
> > +
> >  		return 1;
> >  	}
> >  
> I don't understand how this will help. We need to reset the controller
> which reset_devices cannot do alone. I just haven't have the time to
> implement the fix yet.
> 
> mikem 

I definately agree.  Actually resetting the hardware so that odd responses would
never be received would be a much better solution.  However, when this problem
(and the above corresponding workaround to fix it) was first proposed almost a
year ago:
http://www.ussg.iu.edu/hypermail/linux/kernel/0606.2/3055.html

It was met with no action.  I understand that actually doing a reset of the
hardware is a much better solution, but I'm certainly not knoweldgeable enough,
nor do I have the documentation needed to implement that solution.  Until it is,
this patch lets kexec work properly on this hardware, which I think is a good
trade until such time as the proper fix is implemented.

Thanks & Regards
Neil

-- 
/***************************************************
 *Neil Horman
 *Software Engineer
 *Red Hat, Inc.
 *nhorman@tuxdriver.com
 *gpg keyid: 1024D / 0x92A74FA1
 *http://pgp.mit.edu
 ***************************************************/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] cciss: force ignore of responses to unsent scsi commands after kexec reboot
  2007-06-14 19:25   ` Neil Horman
@ 2007-06-18  4:30     ` Vivek Goyal
  2007-06-18 16:39       ` Miller, Mike (OS Dev)
  0 siblings, 1 reply; 8+ messages in thread
From: Vivek Goyal @ 2007-06-18  4:30 UTC (permalink / raw)
  To: Neil Horman; +Cc: Miller, Mike (OS Dev), linux-kernel, ISS StorageDev, akpm

On Thu, Jun 14, 2007 at 03:25:23PM -0400, Neil Horman wrote:
> On Thu, Jun 14, 2007 at 06:16:03PM -0000, Miller, Mike (OS Dev) wrote:
> >  
> > 
> > > -----Original Message-----
> > > From: Neil Horman [mailto:nhorman@tuxdriver.com] 
> > > Sent: Thursday, June 14, 2007 10:31 AM
> > > To: linux-kernel@vger.kernel.org
> > > Cc: Miller, Mike (OS Dev); ISS StorageDev; 
> > > akpm@linux-foundation.org; nhorman@tuxdriver.com
> > > Subject: [PATCH] cciss: force ignore of responses to unsent 
> > > scsi commands after kexec reboot
> > > 
> > > Hey -
> > > 	cciss hardware currently can continue to send responses 
> > > to scsi commands after the host system has undergone a kexec 
> > > reboot.  The way the drier is currently written, reception of 
> > > these commands results in a BUG halt, since it can't match 
> > > the response to any issued command since the boot.  This 
> > > patch corrects that by using the kexec reset_devices command 
> > > line paramter to force ignore any commands that it cant correlate.
> > > 
> > > Regards
> > > Neil
> > > 
> > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > > 
> > > 
> > >  cciss.c |    8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > > 
> > > 
> > > diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c 
> > > index 5acc6c4..ec1c1d2 100644
> > > --- a/drivers/block/cciss.c
> > > +++ b/drivers/block/cciss.c
> > > @@ -2131,6 +2131,14 @@ static int add_sendcmd_reject(__u8 
> > > cmd, int ctlr, unsigned long complete)
> > >  		       ctlr, complete);
> > >  		/* not much we can do. */
> > >  #ifdef CONFIG_CISS_SCSI_TAPE
> > > +		/* We might get notification of completion of commands
> > > +		 * which we never issued in this kernel if this boot is
> > > +		 * taking place after previous kernel's crash. Simply
> > > +		 * ignore the commands in this case.
> > > +		 */
> > > +		if (reset_devices)
> > > +			return 0;
> > > +
> > >  		return 1;
> > >  	}

I think this is not the right usage of reset_devices parameter. This
parameter instructs the driver to reset the device before going ahead
with rest of the initialization before as underlying device might not
be in a sane state. kexec/kdump is one of the usages and this can also
be useful in the case of BIOS not doing its job.

When I had proposed crash_boot parameter for kexec/kdump purposes, that time
andrew had suggested that he is afraid that driver authors will use this
parameter to solve all kind of problems. 

I think we should stick to the theme of the parameter and implement the
reset routine for cciss driver instead of simply returning back. Consider
the case of hypothetical scenario where somebody booted the kernel with
reset_device parameter (because of unreliable bios) and if there is a problem
on kernel side that after it issues the command it lost track of that
(because of kernel bug) then driver will never catch that bug as upon receiving
the response it will simply ignore that.

Mike, you know most about this device. Can you please help out with 
implementing a reset routing for it?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] cciss: force ignore of responses to unsent scsi commands after kexec reboot
  2007-06-18  4:30     ` Vivek Goyal
@ 2007-06-18 16:39       ` Miller, Mike (OS Dev)
  0 siblings, 0 replies; 8+ messages in thread
From: Miller, Mike (OS Dev) @ 2007-06-18 16:39 UTC (permalink / raw)
  To: vgoyal, Neil Horman; +Cc: linux-kernel, ISS StorageDev, akpm, SCSI Mailing List

Vivek wrote: 

> 
> I think this is not the right usage of reset_devices 
> parameter. This parameter instructs the driver to reset the 
> device before going ahead with rest of the initialization 
> before as underlying device might not be in a sane state. 
> kexec/kdump is one of the usages and this can also be useful 
> in the case of BIOS not doing its job.
> 
> When I had proposed crash_boot parameter for kexec/kdump 
> purposes, that time andrew had suggested that he is afraid 
> that driver authors will use this parameter to solve all kind 
> of problems. 
> 
> I think we should stick to the theme of the parameter and 
> implement the reset routine for cciss driver instead of 
> simply returning back. Consider the case of hypothetical 
> scenario where somebody booted the kernel with reset_device 
> parameter (because of unreliable bios) and if there is a 
> problem on kernel side that after it issues the command it 
> lost track of that (because of kernel bug) then driver will 
> never catch that bug as upon receiving the response it will 
> simply ignore that.
> 
> Mike, you know most about this device. Can you please help 
> out with implementing a reset routing for it?
> 
Vivek
I think I finally have an idea that will work. (`bout time!) We actually
have 2 different issues. One is that there may outstanding commands on
the controller when the kdump kernel initializes. Our SAS controllers
support the reset message defined in the open CISS spec which will
(hopefully) resolve this issue. The second problem is that I cannot
allocate my MSI-X vectors because I couldn't free the vectors then
disable MSI. So the cciss driver would most likely panic at that time.
My idea for this is to put the card into INTx mode rather than MSI or
MSI-X. That should the 2nd issue.
I haven't tested the 64xx series to see if they support the reset
message. I should to write the code today, maybe test by tomorrow and
then send something upstream.

mikem

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] cciss: force ignore of responses to unsent scsi commands after kexec reboot
  2007-06-14 15:31 [PATCH] cciss: force ignore of responses to unsent scsi commands after kexec reboot Neil Horman
  2007-06-14 15:59 ` Randy Dunlap
  2007-06-14 18:16 ` Miller, Mike (OS Dev)
@ 2007-07-03 15:59 ` Miller, Mike (OS Dev)
  2 siblings, 0 replies; 8+ messages in thread
From: Miller, Mike (OS Dev) @ 2007-07-03 15:59 UTC (permalink / raw)
  To: Miller, Mike (OS Dev), Neil Horman, linux-kernel; +Cc: ISS StorageDev, akpm

 

> -----Original Message-----
> From: Miller, Mike (OS Dev) 
> Sent: Thursday, June 14, 2007 1:16 PM
> To: 'Neil Horman'; linux-kernel@vger.kernel.org
> Cc: ISS StorageDev; akpm@linux-foundation.org
> Subject: RE: [PATCH] cciss: force ignore of responses to 
> unsent scsi commands after kexec reboot
> 
>  
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Thursday, June 14, 2007 10:31 AM
> > To: linux-kernel@vger.kernel.org
> > Cc: Miller, Mike (OS Dev); ISS StorageDev; 
> akpm@linux-foundation.org; 
> > nhorman@tuxdriver.com
> > Subject: [PATCH] cciss: force ignore of responses to unsent scsi 
> > commands after kexec reboot
> > 
> > Hey -
> > 	cciss hardware currently can continue to send responses to scsi 
> > commands after the host system has undergone a kexec 
> reboot.  The way 
> > the drier is currently written, reception of these commands 
> results in 
> > a BUG halt, since it can't match the response to any issued command 
> > since the boot.  This patch corrects that by using the kexec 
> > reset_devices command line paramter to force ignore any 
> commands that 
> > it cant correlate.
> > 
> > Regards
> > Neil
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

Nacked-by: Mike Miller <mike.miller@hp.com>

> > 
> > 
> >  cciss.c |    8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > 
> > diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c index 
> > 5acc6c4..ec1c1d2 100644
> > --- a/drivers/block/cciss.c
> > +++ b/drivers/block/cciss.c
> > @@ -2131,6 +2131,14 @@ static int add_sendcmd_reject(__u8 cmd, int 
> > ctlr, unsigned long complete)
> >  		       ctlr, complete);
> >  		/* not much we can do. */
> >  #ifdef CONFIG_CISS_SCSI_TAPE
> > +		/* We might get notification of completion of commands
> > +		 * which we never issued in this kernel if this boot is
> > +		 * taking place after previous kernel's crash. Simply
> > +		 * ignore the commands in this case.
> > +		 */
> > +		if (reset_devices)
> > +			return 0;
> > +
> >  		return 1;
> >  	}
> >  
> I don't understand how this will help. We need to reset the 
> controller which reset_devices cannot do alone. I just 
> haven't have the time to implement the fix yet.
> 
> mikem 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-07-03 15:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-14 15:31 [PATCH] cciss: force ignore of responses to unsent scsi commands after kexec reboot Neil Horman
2007-06-14 15:59 ` Randy Dunlap
2007-06-14 16:44   ` Neil Horman
2007-06-14 18:16 ` Miller, Mike (OS Dev)
2007-06-14 19:25   ` Neil Horman
2007-06-18  4:30     ` Vivek Goyal
2007-06-18 16:39       ` Miller, Mike (OS Dev)
2007-07-03 15:59 ` Miller, Mike (OS Dev)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).