All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
@ 2009-11-02  7:35 Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-02 16:30 ` Stefan Richter
  2009-11-02 17:45 ` James Bottomley
  0 siblings, 2 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-11-02  7:35 UTC (permalink / raw)
  To: James Bottomley; +Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

Hi James,


Please let me know your opinion on the explanation given by me in the previous e-mail to the issue raised by you so that I will proceed further based on your feedback as lot of IBM customers and RedHat have been waiting for this patch to come in linux-scsi upstream even though we have given private build to some of the IBM customers like Cisco and SAP?

Thanks,
Narasimha Reddy

-----Original Message-----
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Sent: Friday, October 23, 2009 6:41 PM
To: 'James Bottomley'
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

Hi James,

Thanks a lot for your feedback.

Here is my explanation for the change. We have done good amount of review and testing before finalizing the change. We found the following code also caused to exit the management request prematurely without getting response from the firmware during our review, which in turn caused to generated False Raid Alert events in the application layer.

So after doing good amount of review and testing, we replace the below code

            else if (down_interruptible(&fibptr->event_wait)) {
                      fibptr->done = 2;
                      up(&fibptr->event_wait);
          }

With the following code
            } else
                down_interruptible (&fibptr->event_wait);

To overcome the warning, we can replace the above code with either

            } else if (down_interruptible (&fibptr->event_wait));

                        Or

            } else {
                 if (down_interruptible (&fibptr->event_wait))
                        ;
            }

I am ok to do the above mentioned change in the code and resubmit the patch with the above change as there is no issue with respect to functionality and other aspects as such.

Please let me know your opinion on this so that I will proceed further based on your feedback.


Important note:
-----------------

fibptr->event_wait is initialized with the following OS kernel function call during the driver initialization.

        init_MUTEX_LOCKED(&fibptr->event_wait);

The above OS kernel function call initializes the semaphore to locked (unavailable) state. On using the fibptr->event_wait by down_interruptible, the semaphore is not available so process/task will enter into sleep mode until the semaphore is available. The semaphore will be available only when the response gets from the firmware for the request delivered to the firmware. When response comes from the firmware then driver's response path code calls the up() if the fibpter->done is equal to zero. So the corresponding task/process gets semaphore and wakes the task/process up and proceeds with further processing.

Suppose the process/task, who is waiting for semaphore, gets exited before getting response from the firmware due to some signal, then driver should not call up() function as there is no point calling up() function after exiting as the fibptr->event_wait was initialized to locked (unavailable) state during the driver initialization.

So my point, here, is that we should not call up() if process/task is exited prematurely before getting the response from the firmware for the request delivered to firmware as the fibptr->event_wait was initialized to locked (unavailable) state.
---------------------------------------------------------------------------

Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de]
Sent: Thursday, October 22, 2009 6:40 AM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Fri, 2009-10-09 at 14:53 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Hi James and linux-scsi community,
>
> Can you please let us know your feedback on my response to the queries
> raised by James so that I will proceed further based on your feedback?

Sorry, press of conferences has delayed me getting around to looking at
this.

The patch is spitting this compile warning:

drivers/scsi/aacraid/commsup.c: In function 'aac_fib_send':
drivers/scsi/aacraid/commsup.c:539: warning: ignoring return value of 'down_interruptible', declared with attribute warn_unused_result

Because of this hunk:

@@ -515,15 +535,14 @@ int aac_fib_send(u16 command, struct fib *fibptr, unsigned
                                }
                                udelay(5);
                        }
-               } else if (down_interruptible(&fibptr->event_wait)) {
-                       fibptr->done = 2;
-                       up(&fibptr->event_wait);
-               }
+               } else
+                       down_interruptible(&fibptr->event_wait);


The reason for the warning is that down_interruptible() will return an
error if it gets interrupted without acquiring the semaphore, and the
kernel checkers believe you can't write correct code without knowing
whether you acquired the semaphore or not.

Now, the original use case didn't care it was just waiting for the
semaphore to become available but didn't actually want to acquire it, so
I suspect what you want is:

} else if (down_interruptible(&fibptr->event_wait))
                     up(&fibptr->event_wait);

If you suddenly do care about acquiring the semaphore, you need to do
something about the interrupted failure case.

James




DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-02  7:35 [PATCH ] scsi-misc-2.6: File System going into read-only mode Penchala Narasimha Reddy Chilakala, TLS-Chennai
@ 2009-11-02 16:30 ` Stefan Richter
  2009-11-03  9:54   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-02 17:45 ` James Bottomley
  1 sibling, 1 reply; 40+ messages in thread
From: Stefan Richter @ 2009-11-02 16:30 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
  Cc: James Bottomley, 'linux-scsi@vger.kernel.org', ServeRAID Driver

Penchala Narasimha Reddy Chilakala, TLS-Chennai wrote:
> Hi James,
> 
> Please let me know your opinion on the explanation given by me in the
> previous e-mail to the issue raised by you so that I will proceed
> further based on your feedback
[...]
> -----Original Message-----
> From: Penchala Narasimha Reddy Chilakala, TLS-Chennai
> Sent: Friday, October 23, 2009 6:41 PM
[...]
> Here is my explanation for the change. We have done good amount of
> review and testing before finalizing the change. We found the following
> code also caused to exit the management request prematurely without
> getting response from the firmware during our review, which in turn
> caused to generated False Raid Alert events in the application layer.
> 
> So after doing good amount of review and testing, we replace the below
> code
> 
>             else if (down_interruptible(&fibptr->event_wait)) {
>                       fibptr->done = 2;
>                       up(&fibptr->event_wait);
>           }
> 
> With the following code
>             } else
>                 down_interruptible (&fibptr->event_wait);
> 
> To overcome the warning,

[i.e. "warning: ignoring return value of 'down_interruptible', declared
with attribute warn_unused_result"]

> we can replace the above code with either
> 
>             } else if (down_interruptible (&fibptr->event_wait));
> 
>                         Or
> 
>             } else {
>                  if (down_interruptible (&fibptr->event_wait))
>                         ;
>             }
> 
> I am ok to do the above mentioned change in the code and resubmit the
> patch with the above change as there is no issue with respect to
> functionality and other aspects as such.

I'm not James, but I do have a suggestion:

Leave the compiler warning there for the short term.  And then, instead
of suppressing the warning with hard to read code, convert from
semaphore to completion (or waitqueue perhaps).

I'm not familiar with aacraid but your problem description sounds to me
as if the legacy counting semaphore API is not the correct kernel API to
use here.
-- 
Stefan Richter
-=====-==--= =-=- ===--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-02  7:35 [PATCH ] scsi-misc-2.6: File System going into read-only mode Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-02 16:30 ` Stefan Richter
@ 2009-11-02 17:45 ` James Bottomley
  2009-11-03  9:38   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  1 sibling, 1 reply; 40+ messages in thread
From: James Bottomley @ 2009-11-02 17:45 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
  Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

On Mon, 2009-11-02 at 13:05 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Please let me know your opinion on the explanation given by me in the
> previous e-mail to the issue raised by you so that I will proceed
> further based on your feedback as lot of IBM customers and RedHat have
> been waiting for this patch to come in linux-scsi upstream even though
> we have given private build to some of the IBM customers like Cisco
> and SAP?

I need the warning fixed.

How about this as the fix suggestion on top of your original patch:

diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index d29af45..1712ebe 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -535,11 +535,11 @@ int aac_fib_send(u16 command, struct fib *fibptr, unsigned long size,
 				}
 				udelay(5);
 			}
-		} else
-			down_interruptible(&fibptr->event_wait);
-
+		} else if (down_interruptible(&fibptr->event_wait)) {
+			fibptr->done = 2;
+		}
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-		if (fibptr->done == 0) {
+		if ((fibptr->done == 0) || (fibptr->done == 2)) {
 			fibptr->done = 2; /* Tell interrupt we aborted */
 			spin_unlock_irqrestore(&fibptr->event_lock, flags);
 			return -ERESTARTSYS;

It preserves the original code, even though setting fibptr->done to 2 is
a bit superfluous, it makes it much more obvious to someone looking at
the diff what the actual fix is.

James



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-02 17:45 ` James Bottomley
@ 2009-11-03  9:38   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-03 15:08     ` James Bottomley
  0 siblings, 1 reply; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-11-03  9:38 UTC (permalink / raw)
  To: James Bottomley; +Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

Hi James,

I can do the following thing, I hope which works well and it would not change any existing functionality behavior and I hope you agree with me. I could not able accept your suggestion because of the following reason.

As the warning fix, here is my suggestion on top of my original patch (I will resubmit a patch with this modification, please let me know if it is ok for you)

diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index d29af45..1712ebe 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -535,11 +535,11 @@ int aac_fib_send(u16 command, struct fib *fibptr, unsigned long size,
 				}
 				udelay(5);
 			}
-		} else
-			down_interruptible(&fibptr->event_wait);
-
+		} else if (down_interruptible(&fibptr->event_wait)) {
+			fibptr->done |= 2;
+		}
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-		if (fibptr->done == 0) {
+		if (!(fibptr->done & 1)) {
 			fibptr->done = 2; /* Tell interrupt we aborted */
 			spin_unlock_irqrestore(&fibptr->event_lock, flags);
 			return -ERESTARTSYS;


Reason:
---------
	Whatever explanation I have given here is based on our test results and problems we encountered during our testing. Nobody might have done this much testing on this code after replacing the "down ()" with "down_interruptible () kernel function. 

"down_interruptible" is not a single statement or single instruction function. It has set of statements, which have to be executed based on its conditions. When it receives a soft interrupt signal, a set of statements will be executed and when it gets semaphore, some other set of statements will be executed. 

	Now, assume that down_interruptible received a soft interrupt signal from X source and started executing those statements and while executing those statements, host driver received a interrupt from aacraid controller and suspends the currently executing task and starts executing the aacraid driver interrupt service route (ISR).  The ISR checks the responses for the requests sent to firmware, if it gets response for the management request then it sets fibptr->done = 1 and it comes out of the ISR and starts executing the suspended task. Now the suspended task will modify the fibptr->done with 2. This is not correct behavior. So, to over come that we have done those changes, after doing those changes, we have executed all the test cases we executed before the changes and we have not seen the
  issue we encountered before the change.

Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de] 
Sent: Monday, November 02, 2009 11:16 PM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Mon, 2009-11-02 at 13:05 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Please let me know your opinion on the explanation given by me in the
> previous e-mail to the issue raised by you so that I will proceed
> further based on your feedback as lot of IBM customers and RedHat have
> been waiting for this patch to come in linux-scsi upstream even though
> we have given private build to some of the IBM customers like Cisco
> and SAP?

I need the warning fixed.

How about this as the fix suggestion on top of your original patch:

diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index d29af45..1712ebe 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -535,11 +535,11 @@ int aac_fib_send(u16 command, struct fib *fibptr, unsigned long size,
 				}
 				udelay(5);
 			}
-		} else
-			down_interruptible(&fibptr->event_wait);
-
+		} else if (down_interruptible(&fibptr->event_wait)) {
+			fibptr->done = 2;
+		}
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-		if (fibptr->done == 0) {
+		if ((fibptr->done == 0) || (fibptr->done == 2)) {
 			fibptr->done = 2; /* Tell interrupt we aborted */
 			spin_unlock_irqrestore(&fibptr->event_lock, flags);
 			return -ERESTARTSYS;

It preserves the original code, even though setting fibptr->done to 2 is
a bit superfluous, it makes it much more obvious to someone looking at
the diff what the actual fix is.

James



DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-02 16:30 ` Stefan Richter
@ 2009-11-03  9:54   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-03 19:02     ` Stefan Richter
  0 siblings, 1 reply; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-11-03  9:54 UTC (permalink / raw)
  To: Stefan Richter
  Cc: James Bottomley, 'linux-scsi@vger.kernel.org', ServeRAID Driver

Hi Stefan,

Thanks a lot for your suggestion and feedback.

Aacraid driver has been using down_interruptible () and up () for quite some time. We will take your suggestion into consideration and keep in our mind. But it is quite difficult to implement your suggestion at this point of time as we need to modify the code in lot places and we need to test a lot.

Thanks,
Narasimha Reddy


-----Original Message-----
From: Stefan Richter [mailto:stefanr@s5r6.in-berlin.de]
Sent: Monday, November 02, 2009 10:00 PM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: James Bottomley; 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: Re: [PATCH ] scsi-misc-2.6: File System going into read-only mode

Penchala Narasimha Reddy Chilakala, TLS-Chennai wrote:
> Hi James,
>
> Please let me know your opinion on the explanation given by me in the
> previous e-mail to the issue raised by you so that I will proceed
> further based on your feedback
[...]
> -----Original Message-----
> From: Penchala Narasimha Reddy Chilakala, TLS-Chennai
> Sent: Friday, October 23, 2009 6:41 PM
[...]
> Here is my explanation for the change. We have done good amount of
> review and testing before finalizing the change. We found the following
> code also caused to exit the management request prematurely without
> getting response from the firmware during our review, which in turn
> caused to generated False Raid Alert events in the application layer.
>
> So after doing good amount of review and testing, we replace the below
> code
>
>             else if (down_interruptible(&fibptr->event_wait)) {
>                       fibptr->done = 2;
>                       up(&fibptr->event_wait);
>           }
>
> With the following code
>             } else
>                 down_interruptible (&fibptr->event_wait);
>
> To overcome the warning,

[i.e. "warning: ignoring return value of 'down_interruptible', declared
with attribute warn_unused_result"]

> we can replace the above code with either
>
>             } else if (down_interruptible (&fibptr->event_wait));
>
>                         Or
>
>             } else {
>                  if (down_interruptible (&fibptr->event_wait))
>                         ;
>             }
>
> I am ok to do the above mentioned change in the code and resubmit the
> patch with the above change as there is no issue with respect to
> functionality and other aspects as such.

I'm not James, but I do have a suggestion:

Leave the compiler warning there for the short term.  And then, instead
of suppressing the warning with hard to read code, convert from
semaphore to completion (or waitqueue perhaps).

I'm not familiar with aacraid but your problem description sounds to me
as if the legacy counting semaphore API is not the correct kernel API to
use here.
--
Stefan Richter
-=====-==--= =-=- ===--
http://arcgraph.de/sr/

DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-03  9:38   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
@ 2009-11-03 15:08     ` James Bottomley
  2009-11-04  7:07       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  0 siblings, 1 reply; 40+ messages in thread
From: James Bottomley @ 2009-11-03 15:08 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
  Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

On Tue, 2009-11-03 at 15:08 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Hi James,
> 
> I can do the following thing, I hope which works well and it would not change any existing functionality behavior and I hope you agree with me. I could not able accept your suggestion because of the following reason.
> 
> As the warning fix, here is my suggestion on top of my original patch (I will resubmit a patch with this modification, please let me know if it is ok for you)
> 
> diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
> index d29af45..1712ebe 100644
> --- a/drivers/scsi/aacraid/commsup.c
> +++ b/drivers/scsi/aacraid/commsup.c
> @@ -535,11 +535,11 @@ int aac_fib_send(u16 command, struct fib *fibptr, unsigned long size,
>  				}
>  				udelay(5);
>  			}
> -		} else
> -			down_interruptible(&fibptr->event_wait);
> -
> +		} else if (down_interruptible(&fibptr->event_wait)) {
> +			fibptr->done |= 2;
> +		}
>  		spin_lock_irqsave(&fibptr->event_lock, flags);
> -		if (fibptr->done == 0) {
> +		if (!(fibptr->done & 1)) {
>  			fibptr->done = 2; /* Tell interrupt we aborted */
>  			spin_unlock_irqrestore(&fibptr->event_lock, flags);
>  			return -ERESTARTSYS;
> 
> 
> Reason:
> ---------
> 	Whatever explanation I have given here is based on our test results
> and problems we encountered during our testing. Nobody might have done
> this much testing on this code after replacing the "down ()" with
> "down_interruptible () kernel function. 
> 
> "down_interruptible" is not a single statement or single instruction
> function. It has set of statements, which have to be executed based on
> its conditions. When it receives a soft interrupt signal, a set of
> statements will be executed and when it gets semaphore, some other set
> of statements will be executed. 
> 
> 	Now, assume that down_interruptible received a soft interrupt signal
> from X source and started executing those statements and while
> executing those statements, host driver received a interrupt from
> aacraid controller and suspends the currently executing task and
> starts executing the aacraid driver interrupt service route (ISR).
> The ISR checks the responses for the requests sent to firmware, if it
> gets response for the management request then it sets fibptr->done = 1
> and it comes out of the ISR and starts executing the suspended task.
> Now the suspended task will modify the fibptr->done with 2. This is
> not correct behavior. So, to over come that we have done those
> changes, after doing those changes, we have executed all the test
> cases we executed before the changes and we have not seen the issue we
> encountered before the change.

Yes, I agree with the analysis.  In that case just make it

else if (down_interruptible(&fibptr->event_wait)) {
	/* do nothing ... satisfy down_interruptible must_check */
}

And keep the original.

James



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-03  9:54   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
@ 2009-11-03 19:02     ` Stefan Richter
  2009-11-04  7:00       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-04  9:18       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  0 siblings, 2 replies; 40+ messages in thread
From: Stefan Richter @ 2009-11-03 19:02 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
  Cc: James Bottomley, 'linux-scsi@vger.kernel.org', ServeRAID Driver

Penchala Narasimha Reddy Chilakala, TLS-Chennai wrote:
> Hi Stefan,
> 
> Thanks a lot for your suggestion and feedback.
> 
> Aacraid driver has been using down_interruptible () and up () for
> quite some time. We will take your suggestion into consideration
> and keep in our mind. But it is quite difficult to implement your
> suggestion at this point of time as we need to modify the code in
> lot places and we need to test a lot.

dpcsup.c:	2x up(&fib->event_wait)
commsup.c:	allocation and initialization of the semaphore
		1x down_trylock(&fibptr->event_wait) +
		1x down_interruptible(&fibptr->event_wait) + up()
		1x up(&fib->event_wait)

So there are just two places which conditionally wait for completion (or
just one place but with a differentiation between sync and async mode),
and three places which signal "done".

You are right that this would be a non-trivial change and will require
respective testing.  However, (1.) only the place which waits for
completion is the non-trivial part (because it's currently coded in a
rather obfuscated way), (2.) I expect that the result would be cleaner
code which is (3.) more obvious to be correct and (4.) can also be
checked for correctness by the lockdep facility --- this is not possible
with legacy semaphores.

Anyway; it's up to those who use or maintain this driver.  I for one
converted another subsystem away from two or three different misused
semaphores to more appropriate APIs and it wasn't too hard; but I can't
do this for aacraid since I don't have hardware to test.

(In my rather irrelevant opinion, whatever you decide for now, just do
_not_ do this here:

>>> we can replace the above code with either
>>>
>>>             } else if (down_interruptible (&fibptr->event_wait));
>>>
>>>                         Or
>>>
>>>             } else {
>>>                  if (down_interruptible (&fibptr->event_wait))
>>>                         ;
>>>             }

because it can really hurt in the long run if sensible warnings are hidden.)
-- 
Stefan Richter
-=====-==--= =-== ---==
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-03 19:02     ` Stefan Richter
@ 2009-11-04  7:00       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-04  9:18       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  1 sibling, 0 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-11-04  7:00 UTC (permalink / raw)
  To: Stefan Richter
  Cc: James Bottomley, 'linux-scsi@vger.kernel.org', ServeRAID Driver

Hi Stefan,

All your feedback will be considered, and need to be discussed with other people before trying it out in aacraid code, but it would be taken as experiment bases because we need to work on other customer defects as well and it needs lot of testing as I said earlier. So it will take quite a sometime.

Thanks,
Narasimha Reddy

-----Original Message-----
From: Stefan Richter [mailto:stefanr@s5r6.in-berlin.de]
Sent: Wednesday, November 04, 2009 12:33 AM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: James Bottomley; 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: Re: [PATCH ] scsi-misc-2.6: File System going into read-only mode

Penchala Narasimha Reddy Chilakala, TLS-Chennai wrote:
> Hi Stefan,
>
> Thanks a lot for your suggestion and feedback.
>
> Aacraid driver has been using down_interruptible () and up () for
> quite some time. We will take your suggestion into consideration
> and keep in our mind. But it is quite difficult to implement your
> suggestion at this point of time as we need to modify the code in
> lot places and we need to test a lot.

dpcsup.c:       2x up(&fib->event_wait)
commsup.c:      allocation and initialization of the semaphore
                1x down_trylock(&fibptr->event_wait) +
                1x down_interruptible(&fibptr->event_wait) + up()
                1x up(&fib->event_wait)

So there are just two places which conditionally wait for completion (or
just one place but with a differentiation between sync and async mode),
and three places which signal "done".

You are right that this would be a non-trivial change and will require
respective testing.  However, (1.) only the place which waits for
completion is the non-trivial part (because it's currently coded in a
rather obfuscated way), (2.) I expect that the result would be cleaner
code which is (3.) more obvious to be correct and (4.) can also be
checked for correctness by the lockdep facility --- this is not possible
with legacy semaphores.

Anyway; it's up to those who use or maintain this driver.  I for one
converted another subsystem away from two or three different misused
semaphores to more appropriate APIs and it wasn't too hard; but I can't
do this for aacraid since I don't have hardware to test.

(In my rather irrelevant opinion, whatever you decide for now, just do
_not_ do this here:

>>> we can replace the above code with either
>>>
>>>             } else if (down_interruptible (&fibptr->event_wait));
>>>
>>>                         Or
>>>
>>>             } else {
>>>                  if (down_interruptible (&fibptr->event_wait))
>>>                         ;
>>>             }

because it can really hurt in the long run if sensible warnings are hidden.)
--
Stefan Richter
-=====-==--= =-== ---==
http://arcgraph.de/sr/

DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-03 15:08     ` James Bottomley
@ 2009-11-04  7:07       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-04 15:42         ` James Bottomley
  0 siblings, 1 reply; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-11-04  7:07 UTC (permalink / raw)
  To: James Bottomley; +Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

Hi James,

Please let me know that shall I generate new patch with this modification and resubmit it?

Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de] 
Sent: Tuesday, November 03, 2009 8:39 PM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Tue, 2009-11-03 at 15:08 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Hi James,
>
> I can do the following thing, I hope which works well and it would not change any existing functionality behavior and I hope you agree with me. I could not able accept your suggestion because of the following reason.
>
> As the warning fix, here is my suggestion on top of my original patch (I will resubmit a patch with this modification, please let me know if it is ok for you)
>
> diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
> index d29af45..1712ebe 100644
> --- a/drivers/scsi/aacraid/commsup.c
> +++ b/drivers/scsi/aacraid/commsup.c
> @@ -535,11 +535,11 @@ int aac_fib_send(u16 command, struct fib *fibptr, unsigned long size,
>                               }
>                               udelay(5);
>                       }
> -             } else
> -                     down_interruptible(&fibptr->event_wait);
> -
> +             } else if (down_interruptible(&fibptr->event_wait)) {
> +                     fibptr->done |= 2;
> +             }
>               spin_lock_irqsave(&fibptr->event_lock, flags);
> -             if (fibptr->done == 0) {
> +             if (!(fibptr->done & 1)) {
>                       fibptr->done = 2; /* Tell interrupt we aborted */
>                       spin_unlock_irqrestore(&fibptr->event_lock, flags);
>                       return -ERESTARTSYS;
>
>
> Reason:
> ---------
>       Whatever explanation I have given here is based on our test results
> and problems we encountered during our testing. Nobody might have done
> this much testing on this code after replacing the "down ()" with
> "down_interruptible () kernel function.
>
> "down_interruptible" is not a single statement or single instruction
> function. It has set of statements, which have to be executed based on
> its conditions. When it receives a soft interrupt signal, a set of
> statements will be executed and when it gets semaphore, some other set
> of statements will be executed.
>
>       Now, assume that down_interruptible received a soft interrupt signal
> from X source and started executing those statements and while
> executing those statements, host driver received a interrupt from
> aacraid controller and suspends the currently executing task and
> starts executing the aacraid driver interrupt service route (ISR).
> The ISR checks the responses for the requests sent to firmware, if it
> gets response for the management request then it sets fibptr->done = 1
> and it comes out of the ISR and starts executing the suspended task.
> Now the suspended task will modify the fibptr->done with 2. This is
> not correct behavior. So, to over come that we have done those
> changes, after doing those changes, we have executed all the test
> cases we executed before the changes and we have not seen the issue we
> encountered before the change.

Yes, I agree with the analysis.  In that case just make it

else if (down_interruptible(&fibptr->event_wait)) {
        /* do nothing ... satisfy down_interruptible must_check */
}

And keep the original.

James



DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-03 19:02     ` Stefan Richter
  2009-11-04  7:00       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
@ 2009-11-04  9:18       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-04 14:48         ` Stefan Richter
  1 sibling, 1 reply; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-11-04  9:18 UTC (permalink / raw)
  To: Stefan Richter
  Cc: James Bottomley, 'linux-scsi@vger.kernel.org', ServeRAID Driver

Hi Stefan,

Yes, you are right that the up() is used in 3 places and down_trylock, down_interruptible in one place each, but we need to do other changes like when we return a value ERESTATSYS from that particular function, the value will be checked in many places to take appropriate action and fibptr->done also checked in some other places. Considering all these, I mentioned that we need to modify the code in lot places.

I hope you understand.

Thanks,
Narasimha Reddy

-----Original Message-----
From: Stefan Richter [mailto:stefanr@s5r6.in-berlin.de]
Sent: Wednesday, November 04, 2009 12:33 AM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: James Bottomley; 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: Re: [PATCH ] scsi-misc-2.6: File System going into read-only mode

Penchala Narasimha Reddy Chilakala, TLS-Chennai wrote:
> Hi Stefan,
>
> Thanks a lot for your suggestion and feedback.
>
> Aacraid driver has been using down_interruptible () and up () for
> quite some time. We will take your suggestion into consideration
> and keep in our mind. But it is quite difficult to implement your
> suggestion at this point of time as we need to modify the code in
> lot places and we need to test a lot.

dpcsup.c:       2x up(&fib->event_wait)
commsup.c:      allocation and initialization of the semaphore
                1x down_trylock(&fibptr->event_wait) +
                1x down_interruptible(&fibptr->event_wait) + up()
                1x up(&fib->event_wait)

So there are just two places which conditionally wait for completion (or
just one place but with a differentiation between sync and async mode),
and three places which signal "done".

You are right that this would be a non-trivial change and will require
respective testing.  However, (1.) only the place which waits for
completion is the non-trivial part (because it's currently coded in a
rather obfuscated way), (2.) I expect that the result would be cleaner
code which is (3.) more obvious to be correct and (4.) can also be
checked for correctness by the lockdep facility --- this is not possible
with legacy semaphores.

Anyway; it's up to those who use or maintain this driver.  I for one
converted another subsystem away from two or three different misused
semaphores to more appropriate APIs and it wasn't too hard; but I can't
do this for aacraid since I don't have hardware to test.

(In my rather irrelevant opinion, whatever you decide for now, just do
_not_ do this here:

>>> we can replace the above code with either
>>>
>>>             } else if (down_interruptible (&fibptr->event_wait));
>>>
>>>                         Or
>>>
>>>             } else {
>>>                  if (down_interruptible (&fibptr->event_wait))
>>>                         ;
>>>             }

because it can really hurt in the long run if sensible warnings are hidden.)
--
Stefan Richter
-=====-==--= =-== ---==
http://arcgraph.de/sr/

DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-04  9:18       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
@ 2009-11-04 14:48         ` Stefan Richter
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Richter @ 2009-11-04 14:48 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
  Cc: James Bottomley, 'linux-scsi@vger.kernel.org', ServeRAID Driver

Penchala Narasimha Reddy Chilakala, TLS-Chennai wrote:
> Yes, you are right that the up() is used in 3 places and down_trylock,
> down_interruptible in one place each, but we need to do other changes
> like when we return a value ERESTATSYS from that particular function,
> the value will be checked in many places to take appropriate action
> and fibptr->done also checked in some other places. Considering all
> these, I mentioned that we need to modify the code in lot places.

I disagree with this analysis.  I am only talking about the type of
struct fib.event_wait which very probably should be a struct completion
or wait_queue_head_t.

> I hope you understand.

If and why you as author and submitter of the short term fix are also
interested in a real long term fix or not is _entirely_ up to you, or
anybody else who works with the aacraid driver.  If I had this hardware
myself I would send a proposed conversion patch instead of comments what
other people could do.  :-)
-- 
Stefan Richter
-=====-==--= =-== --=--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-04  7:07       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
@ 2009-11-04 15:42         ` James Bottomley
  0 siblings, 0 replies; 40+ messages in thread
From: James Bottomley @ 2009-11-04 15:42 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
  Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

On Wed, 2009-11-04 at 12:37 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Please let me know that shall I generate new patch with this modification and resubmit it?

Please do ... even if it's not quite right yet, it makes sure we're all
talking about the same piece of code.

James



^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE:  [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
@ 2009-12-24  6:29 Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  0 siblings, 0 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, ERS-HCLTech @ 2009-12-24  6:29 UTC (permalink / raw)
  To: James Bottomley, 'linux-scsi; +Cc: ServeRAID Driver

Hi James and linux-scsi community,

Please let us your comments on the patch (community_aac24702.patch) we submitted on December 21, 2009.

Thanks,
Narasimha Reddy

-----Original Message-----
From: Penchala Narasimha Reddy Chilakala, ERS-HCLTech 
Sent: Monday, December 21, 2009 6:39 PM
To: 'linux-scsi@vger.kernel.org'; 'James Bottomley'
Cc: ServeRAID Driver
Subject: [PATCH ] scsi-misc-2.6: File System going into read-only mode 

Hi James and linux-scsi community,

       I would request you that please review the attached updated patch, which contains the fix for racy issue (which was raised by James) and do the needful to incorporate the patch in up-coming release. 

Note: The patch was generated against the "kernel 2.6.33-rc1"

From: Narasimha Reddy <ServeRAIDDriver@hcl.in>

	The attached patch was generated for fixing the following issues. These issues were reported by Cisco, SAP and some other customers as well.

Regarding the testing of fixes:
--------------------------------

	These particular problems were reported by Cisco and SAP and customers as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build to IBM and Cisco. Cisco and IBM tested it for more than 15 days and they reported that they did not see the issue so far. Before the fix, Cisco used to see the issue within 5 days. We generated a patch for SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the patch and gave a test build to SAP. SAP tested and reported that the build is working properly.

We also tested in our lab using the tools "dishogsync", which is IO stress tool and the tool was provided by Cisco.

Issue1: File System going into read-only mode
---------

Root cause:
-----------
       The driver tends to not free the memory (FIB) when the management request exits prematurely. The accumulation of such un-freed memory causes the driver to fail to allocate anymore memory (FIB) and hence return 0x70000 value to the upper layer, which puts the file system into read only mode.

Fix details:
------------
     The fix makes sure to free the memory (FIB) even if the request exits prematurely hence ensuring the driver wouldn't run out of memory (FIBs).


Issue2:
------- 
	False Raid Alert occurs- when the Physical Drives and Logical drives are reported as deleted or added, even though there is no change done on the system

Root cause:
-----------
        Driver IOCTLs is signaled with EINTR while waiting on response from the lower layers. Returning "EINTR" will never initiate internal retry. 

Fix details:
------------
        The issue was fixed by replacing "EINTR" with "ERESTARTSYS" for mid-layer retries.


Signed-off-by: Narasimha Reddy <ServeRAIDDriver@hcl.in>

       Please find the attached patch file "community_aac24701.patch", which has been generated for the issues we fixed and "email" file "community_aac24701.email", which has been generated for the statistics like number of lines inserted and deleted, errors etc.
       
Regards - Narasimha Reddy


DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
@ 2009-12-21 13:09 Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  0 siblings, 0 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, ERS-HCLTech @ 2009-12-21 13:09 UTC (permalink / raw)
  To: 'linux-scsi, James Bottomley; +Cc: ServeRAID Driver

[-- Attachment #1: Type: text/plain, Size: 3692 bytes --]

Hi James and linux-scsi community,

       I would request you that please review the attached updated patch, which contains the fix for racy issue (which was raised by James) and do the needful to incorporate the patch in up-coming release. 

Note: The patch was generated against the "kernel 2.6.33-rc1"

From: Narasimha Reddy <ServeRAIDDriver@hcl.in>

	The attached patch was generated for fixing the following issues. These issues were reported by Cisco, SAP and some other customers as well.

Regarding the testing of fixes:
--------------------------------

	These particular problems were reported by Cisco and SAP and customers as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build to IBM and Cisco. Cisco and IBM tested it for more than 15 days and they reported that they did not see the issue so far. Before the fix, Cisco used to see the issue within 5 days. We generated a patch for SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the patch and gave a test build to SAP. SAP tested and reported that the build is working properly.

We also tested in our lab using the tools "dishogsync", which is IO stress tool and the tool was provided by Cisco.

Issue1: File System going into read-only mode
---------

Root cause:
-----------
       The driver tends to not free the memory (FIB) when the management request exits prematurely. The accumulation of such un-freed memory causes the driver to fail to allocate anymore memory (FIB) and hence return 0x70000 value to the upper layer, which puts the file system into read only mode.

Fix details:
------------
     The fix makes sure to free the memory (FIB) even if the request exits prematurely hence ensuring the driver wouldn't run out of memory (FIBs).


Issue2:
------- 
	False Raid Alert occurs- when the Physical Drives and Logical drives are reported as deleted or added, even though there is no change done on the system

Root cause:
-----------
        Driver IOCTLs is signaled with EINTR while waiting on response from the lower layers. Returning "EINTR" will never initiate internal retry. 

Fix details:
------------
        The issue was fixed by replacing "EINTR" with "ERESTARTSYS" for mid-layer retries.


Signed-off-by: Narasimha Reddy <ServeRAIDDriver@hcl.in>

       Please find the attached patch file "community_aac24701.patch", which has been generated for the issues we fixed and "email" file "community_aac24701.email", which has been generated for the statistics like number of lines inserted and deleted, errors etc.
       
Regards - Narasimha Reddy


DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

[-- Attachment #2: community_aac24702.email --]
[-- Type: application/octet-stream, Size: 670 bytes --]


This attached patch is against current scsi-misc-2.6.

ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments.

Signed-off-by: Penchala Narasimha Reddy <ServeRAIDDriver@hcl.in>

 drivers/scsi/aacraid/aachba.c   |   52 ++++++++++++++++++++++------
 drivers/scsi/aacraid/aacraid.h  |    5 ++
 drivers/scsi/aacraid/commctrl.c |   28 +++++++--------
 drivers/scsi/aacraid/comminit.c |    6 ++-
 drivers/scsi/aacraid/commsup.c  |   72 +++++++++++++++++++++++++++++++++-------
 drivers/scsi/aacraid/dpcsup.c   |   36 ++++++++++++++++----
 6 files changed, 154 insertions(+), 45 deletions(-)

Sincerely - Penchala Narasimha Reddy

[-- Attachment #3: community_aac24702.patch --]
[-- Type: application/octet-stream, Size: 14750 bytes --]

diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
--- a/drivers/scsi/aacraid/aachba.c	2009-12-18 06:44:40.000000000 +0530
+++ b/drivers/scsi/aacraid/aachba.c	2009-12-21 12:57:48.000000000 +0530
@@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
 			status = -EINVAL;
 		}
 	}
-	aac_fib_complete(fibptr);
+	/* Do not set XferState to zero unless receives a response from F/W */
+	if (status >= 0)
+		aac_fib_complete(fibptr);
+
 	/* Send a CT_COMMIT_CONFIG to enable discovery of devices */
 	if (status >= 0) {
 		if ((aac_commit == 1) || commit_flag) {
@@ -310,13 +313,18 @@ int aac_get_config_status(struct aac_dev
 				    FsaNormal,
 				    1, 1,
 				    NULL, NULL);
-			aac_fib_complete(fibptr);
+			/* Do not set XferState to zero unless
+			 * receives a response from F/W */
+			if (status >= 0)
+				aac_fib_complete(fibptr);
 		} else if (aac_commit == 0) {
 			printk(KERN_WARNING
 			  "aac_get_config_status: Foreign device configurations are being ignored\n");
 		}
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 	return status;
 }
 
@@ -355,7 +363,9 @@ int aac_get_containers(struct aac_dev *d
 		maximum_num_containers = le32_to_cpu(dresp->ContainerSwitchEntries);
 		aac_fib_complete(fibptr);
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 
 	if (maximum_num_containers < MAXIMUM_NUM_CONTAINERS)
 		maximum_num_containers = MAXIMUM_NUM_CONTAINERS;
@@ -1245,8 +1255,12 @@ int aac_get_adapter_info(struct aac_dev*
 			 NULL);
 
 	if (rcode < 0) {
-		aac_fib_complete(fibptr);
-		aac_fib_free(fibptr);
+		/* FIB should be freed only after
+		 * getting the response from the F/W */
+		if (rcode != -ERESTARTSYS) {
+			aac_fib_complete(fibptr);
+			aac_fib_free(fibptr);
+		}
 		return rcode;
 	}
 	memcpy(&dev->adapter_info, info, sizeof(*info));
@@ -1270,6 +1284,12 @@ int aac_get_adapter_info(struct aac_dev*
 
 		if (rcode >= 0)
 			memcpy(&dev->supplement_adapter_info, sinfo, sizeof(*sinfo));
+		if (rcode == -ERESTARTSYS) {
+			fibptr = aac_fib_alloc(dev);
+			if (!fibptr)
+				return -ENOMEM;
+		}
+
 	}
 
 
@@ -1470,9 +1490,11 @@ int aac_get_adapter_info(struct aac_dev*
 			  (dev->scsi_host_ptr->sg_tablesize * 8) + 112;
 		}
 	}
-
-	aac_fib_complete(fibptr);
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (rcode != -ERESTARTSYS) {
+		aac_fib_complete(fibptr);
+		aac_fib_free(fibptr);
+	}
 
 	return rcode;
 }
@@ -1633,6 +1655,7 @@ static int aac_read(struct scsi_cmnd * s
 	 *	Alocate and initialize a Fib
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
+		printk(KERN_WARNING "aac_read: fib allocation failed\n");
 		return -1;
 	}
 
@@ -1712,9 +1735,14 @@ static int aac_write(struct scsi_cmnd * 
 	 *	Allocate and initialize a Fib then setup a BlockWrite command
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
-		scsicmd->result = DID_ERROR << 16;
-		scsicmd->scsi_done(scsicmd);
-		return 0;
+		/* FIB temporarily unavailable,not catastrophic failure */
+
+		/* scsicmd->result = DID_ERROR << 16;
+		 * scsicmd->scsi_done(scsicmd);
+		 * return 0;
+		 */
+		printk(KERN_WARNING "aac_write: fib allocation failed\n");
+		return -1;
 	}
 
 	status = aac_adapter_write(cmd_fibcontext, scsicmd, lba, count, fua);
diff -purN a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
--- a/drivers/scsi/aacraid/aacraid.h	2009-12-18 06:44:40.000000000 +0530
+++ b/drivers/scsi/aacraid/aacraid.h	2009-12-21 12:58:44.000000000 +0530
@@ -12,7 +12,7 @@
  *----------------------------------------------------------------------------*/
 
 #ifndef AAC_DRIVER_BUILD
-# define AAC_DRIVER_BUILD 2461
+# define AAC_DRIVER_BUILD 24702
 # define AAC_DRIVER_BRANCH "-ms"
 #endif
 #define MAXIMUM_NUM_CONTAINERS	32
@@ -1036,6 +1036,9 @@ struct aac_dev
 	u8			printf_enabled;
 	u8			in_reset;
 	u8			msi;
+	int			management_fib_count;
+	spinlock_t		manage_lock;
+
 };
 
 #define aac_adapter_interrupt(dev) \
diff -purN a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
--- a/drivers/scsi/aacraid/commctrl.c	2009-12-18 06:44:40.000000000 +0530
+++ b/drivers/scsi/aacraid/commctrl.c	2009-12-21 12:59:28.000000000 +0530
@@ -153,7 +153,7 @@ cleanup:
 		fibptr->hw_fib_pa = hw_fib_pa;
 		fibptr->hw_fib_va = hw_fib;
 	}
-	if (retval != -EINTR)
+	if (retval != -ERESTARTSYS)
 		aac_fib_free(fibptr);
 	return retval;
 }
@@ -322,7 +322,7 @@ return_fib:
 		}
 		if (f.wait) {
 			if(down_interruptible(&fibctx->wait_sem) < 0) {
-				status = -EINTR;
+				status = -ERESTARTSYS;
 			} else {
 				/* Lock again and retry */
 				spin_lock_irqsave(&dev->fib_lock, flags);
@@ -593,10 +593,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -645,10 +645,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -695,10 +695,10 @@ static int aac_send_raw_srb(struct aac_d
 				uintptr_t addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -734,10 +734,10 @@ static int aac_send_raw_srb(struct aac_d
 				dma_addr_t addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -772,8 +772,8 @@ static int aac_send_raw_srb(struct aac_d
 		psg->count = cpu_to_le32(sg_indx+1);
 		status = aac_fib_send(ScsiPortCommand, srbfib, actual_fibsize, FsaNormal, 1, 1, NULL, NULL);
 	}
-	if (status == -EINTR) {
-		rcode = -EINTR;
+	if (status == -ERESTARTSYS) {
+		rcode = -ERESTARTSYS;
 		goto cleanup;
 	}
 
@@ -810,7 +810,7 @@ cleanup:
 	for(i=0; i <= sg_indx; i++){
 		kfree(sg_list[i]);
 	}
-	if (rcode != -EINTR) {
+	if (rcode != -ERESTARTSYS) {
 		aac_fib_complete(srbfib);
 		aac_fib_free(srbfib);
 	}
@@ -848,7 +848,7 @@ int aac_do_ioctl(struct aac_dev * dev, i
 	 */
 
 	status = aac_dev_ioctl(dev, cmd, arg);
-	if(status != -ENOTTY)
+	if (status != -ENOTTY)
 		return status;
 
 	switch (cmd) {
diff -purN a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
--- a/drivers/scsi/aacraid/comminit.c	2009-12-18 06:44:40.000000000 +0530
+++ b/drivers/scsi/aacraid/comminit.c	2009-12-21 12:59:58.000000000 +0530
@@ -194,7 +194,9 @@ int aac_send_shutdown(struct aac_dev * d
 
 	if (status >= 0)
 		aac_fib_complete(fibctx);
-	aac_fib_free(fibctx);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibctx);
 	return status;
 }
 
@@ -304,6 +306,8 @@ struct aac_dev *aac_init_adapter(struct 
 	/*
 	 *	Check the preferred comm settings, defaults from template.
 	 */
+	dev->management_fib_count = 0;
+	spin_lock_init(&dev->manage_lock);
 	dev->max_fib_size = sizeof(struct hw_fib);
 	dev->sg_tablesize = host->sg_tablesize = (dev->max_fib_size
 		- sizeof(struct aac_fibhdr)
diff -purN a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
--- a/drivers/scsi/aacraid/commsup.c	2009-12-18 06:44:40.000000000 +0530
+++ b/drivers/scsi/aacraid/commsup.c	2009-12-21 18:13:59.000000000 +0530
@@ -189,7 +189,14 @@ struct fib *aac_fib_alloc(struct aac_dev
 
 void aac_fib_free(struct fib *fibptr)
 {
-	unsigned long flags;
+	unsigned long flags, flagsv;
+
+	spin_lock_irqsave(&fibptr->event_lock, flagsv);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
+		return;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
 
 	spin_lock_irqsave(&fibptr->dev->fib_lock, flags);
 	if (unlikely(fibptr->flags & FIB_CONTEXT_FLAG_TIMED_OUT))
@@ -390,6 +397,8 @@ int aac_fib_send(u16 command, struct fib
 	struct hw_fib * hw_fib = fibptr->hw_fib_va;
 	unsigned long flags = 0;
 	unsigned long qflags;
+	unsigned long mflags = 0;
+
 
 	if (!(hw_fib->header.XferState & cpu_to_le32(HostOwned)))
 		return -EBUSY;
@@ -471,9 +480,31 @@ int aac_fib_send(u16 command, struct fib
 	if (!dev->queues)
 		return -EBUSY;
 
-	if(wait)
+	if (wait) {
+
+		spin_lock_irqsave(&dev->manage_lock, mflags);
+		if (dev->management_fib_count >= AAC_NUM_MGT_FIB) {
+			printk(KERN_INFO "No management Fibs Available:%d\n",
+						dev->management_fib_count);
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+			return -EBUSY;
+		}
+		dev->management_fib_count++;
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-	aac_adapter_deliver(fibptr);
+	}
+
+	if (aac_adapter_deliver(fibptr) != 0) {
+		printk(KERN_ERR "aac_fib_send: returned -EBUSY\n");
+		if (wait) {
+			spin_unlock_irqrestore(&fibptr->event_lock, flags);
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+		}
+		return -EBUSY;
+	}
+
 
 	/*
 	 *	If the caller wanted us to wait for response wait now.
@@ -516,14 +547,15 @@ int aac_fib_send(u16 command, struct fib
 				udelay(5);
 			}
 		} else if (down_interruptible(&fibptr->event_wait)) {
-			fibptr->done = 2;
-			up(&fibptr->event_wait);
+			/* Do nothing ... satisfy
+			 * down_interruptible must_check */
 		}
+
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-		if ((fibptr->done == 0) || (fibptr->done == 2)) {
+		if (fibptr->done == 0) {
 			fibptr->done = 2; /* Tell interrupt we aborted */
 			spin_unlock_irqrestore(&fibptr->event_lock, flags);
-			return -EINTR;
+			return -ERESTARTSYS;
 		}
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
 		BUG_ON(fibptr->done == 0);
@@ -689,6 +721,7 @@ int aac_fib_adapter_complete(struct fib 
 
 int aac_fib_complete(struct fib *fibptr)
 {
+	unsigned long flags;
 	struct hw_fib * hw_fib = fibptr->hw_fib_va;
 
 	/*
@@ -709,6 +742,13 @@ int aac_fib_complete(struct fib *fibptr)
 	 *	command is complete that we had sent to the adapter and this
 	 *	cdb could be reused.
 	 */
+	spin_lock_irqsave(&fibptr->event_lock, flags);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return 0;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
 	if((hw_fib->header.XferState & cpu_to_le32(SentFromHost)) &&
 		(hw_fib->header.XferState & cpu_to_le32(AdapterProcessed)))
 	{
@@ -1355,7 +1395,10 @@ int aac_reset_adapter(struct aac_dev * a
 
 			if (status >= 0)
 				aac_fib_complete(fibctx);
-			aac_fib_free(fibctx);
+			/* FIB should be freed only after getting
+			 * the response from the F/W */
+			if (status != -ERESTARTSYS)
+				aac_fib_free(fibctx);
 		}
 	}
 
@@ -1759,6 +1802,7 @@ int aac_command_thread(void *data)
 				struct fib *fibptr;
 
 				if ((fibptr = aac_fib_alloc(dev))) {
+					int status;
 					__le32 *info;
 
 					aac_fib_init(fibptr);
@@ -1769,15 +1813,21 @@ int aac_command_thread(void *data)
 
 					*info = cpu_to_le32(now.tv_sec);
 
-					(void)aac_fib_send(SendHostTime,
+					status = aac_fib_send(SendHostTime,
 						fibptr,
 						sizeof(*info),
 						FsaNormal,
 						1, 1,
 						NULL,
 						NULL);
-					aac_fib_complete(fibptr);
-					aac_fib_free(fibptr);
+					/* Do not set XferState to zero unless
+					 * receives a response from F/W */
+					if (status >= 0)
+						aac_fib_complete(fibptr);
+					/* FIB should be freed only after
+					 * getting the response from the F/W */
+					if (status != -ERESTARTSYS)
+						aac_fib_free(fibptr);
 				}
 				difference = (long)(unsigned)update_interval*HZ;
 			} else {
diff -purN a/drivers/scsi/aacraid/dpcsup.c b/drivers/scsi/aacraid/dpcsup.c
--- a/drivers/scsi/aacraid/dpcsup.c	2009-12-18 06:44:40.000000000 +0530
+++ b/drivers/scsi/aacraid/dpcsup.c	2009-12-21 13:01:21.000000000 +0530
@@ -57,9 +57,9 @@ unsigned int aac_response_normal(struct 
 	struct hw_fib * hwfib;
 	struct fib * fib;
 	int consumed = 0;
-	unsigned long flags;
+	unsigned long flags, mflags;
 
-	spin_lock_irqsave(q->lock, flags);	
+	spin_lock_irqsave(q->lock, flags);
 	/*
 	 *	Keep pulling response QEs off the response queue and waking
 	 *	up the waiters until there are no more QEs. We then return
@@ -125,12 +125,21 @@ unsigned int aac_response_normal(struct 
 		} else {
 			unsigned long flagv;
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
 			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
 				aac_fib_complete(fib);
 				aac_fib_free(fib);
 			}
@@ -232,6 +241,7 @@ unsigned int aac_command_normal(struct a
 
 unsigned int aac_intr_normal(struct aac_dev * dev, u32 index)
 {
+	unsigned long mflags;
 	dprintk((KERN_INFO "aac_intr_normal(%p,%x)\n", dev, index));
 	if ((index & 0x00000002L)) {
 		struct hw_fib * hw_fib;
@@ -320,11 +330,25 @@ unsigned int aac_intr_normal(struct aac_
 			unsigned long flagv;
 	  		dprintk((KERN_INFO "event_wait up\n"));
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
+			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
+				aac_fib_complete(fib);
+				aac_fib_free(fib);
+			}
+
 		}
 		return 0;
 	}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
@ 2009-12-18 12:25 Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  0 siblings, 0 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, ERS-HCLTech @ 2009-12-18 12:25 UTC (permalink / raw)
  To: James Bottomley
  Cc: 'linux-scsi@vger.kernel.org', James Bottomley, ServeRAID Driver

[-- Attachment #1: Type: text/plain, Size: 2073 bytes --]

Hi James,

Please find the attached patch, in which we have fixed the race condition issue and on which we started running our test cases as well, review the patch and let us know your final comments on the patch.

Thanks,
Narasimha Reddy


-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de]
Sent: Monday, November 16, 2009 9:21 PM
To: Penchala Narasimha Reddy Chilakala, ERS-HCLTech
Cc: 'linux-scsi@vger.kernel.org'; James Bottomley; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Mon, 2009-11-16 at 08:55 +0530, Penchala Narasimha Reddy Chilakala,
ERS-HCLTech wrote:
> Hi James,
>
> Thanks a lot for your nice review, feedback and support so far. I hope
> that we have sorted out all the issues raised by you except the racy
> issue.

Yes ... I think the race problem isn't particularly relevant to the
bugs, so send me a patch without the attempted ioctl fib counting and it
can go into -rc ... you can work on the fib counting later for
scsi-misc.

James



DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

[-- Attachment #2: community_aac24702.email --]
[-- Type: application/octet-stream, Size: 679 bytes --]


This attached patch is against current scsi-misc-2.6.

ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments.

Signed-off-by: Penchala Narasimha Reddy <ServeRAIDDriver@hcl.in>

 drivers/scsi/aacraid/aachba.c   |   52 ++++++++++++++++++++++++--------
 drivers/scsi/aacraid/aacraid.h  |    5 ++-
 drivers/scsi/aacraid/commctrl.c |   28 ++++++++---------
 drivers/scsi/aacraid/comminit.c |    6 +++
 drivers/scsi/aacraid/commsup.c  |   65 +++++++++++++++++++++++++++++++++-------
 drivers/scsi/aacraid/dpcsup.c   |   36 ++++++++++++++++++----
 6 files changed, 148 insertions(+), 44 deletions(-)

Sincerely - Penchala Narasimha Reddy

[-- Attachment #3: community_aac24702.patch --]
[-- Type: application/octet-stream, Size: 14468 bytes --]

diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
--- a/drivers/scsi/aacraid/aachba.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/aachba.c	2009-12-18 17:09:42.000000000 +0530
@@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
 			status = -EINVAL;
 		}
 	}
-	aac_fib_complete(fibptr);
+	/* Do not set XferState to zero unless receives a response from F/W */
+	if (status >= 0)
+		aac_fib_complete(fibptr);
+
 	/* Send a CT_COMMIT_CONFIG to enable discovery of devices */
 	if (status >= 0) {
 		if ((aac_commit == 1) || commit_flag) {
@@ -310,13 +313,18 @@ int aac_get_config_status(struct aac_dev
 				    FsaNormal,
 				    1, 1,
 				    NULL, NULL);
-			aac_fib_complete(fibptr);
+			/* Do not set XferState to zero unless
+			 * receives a response from F/W */
+			if (status >= 0)
+				aac_fib_complete(fibptr);
 		} else if (aac_commit == 0) {
 			printk(KERN_WARNING
 			  "aac_get_config_status: Foreign device configurations are being ignored\n");
 		}
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 	return status;
 }
 
@@ -355,7 +363,9 @@ int aac_get_containers(struct aac_dev *d
 		maximum_num_containers = le32_to_cpu(dresp->ContainerSwitchEntries);
 		aac_fib_complete(fibptr);
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 
 	if (maximum_num_containers < MAXIMUM_NUM_CONTAINERS)
 		maximum_num_containers = MAXIMUM_NUM_CONTAINERS;
@@ -1245,8 +1255,12 @@ int aac_get_adapter_info(struct aac_dev*
 			 NULL);
 
 	if (rcode < 0) {
-		aac_fib_complete(fibptr);
-		aac_fib_free(fibptr);
+		/* FIB should be freed only after
+		 * getting the response from the F/W */
+		if (rcode != -ERESTARTSYS) {
+			aac_fib_complete(fibptr);
+			aac_fib_free(fibptr);
+		}
 		return rcode;
 	}
 	memcpy(&dev->adapter_info, info, sizeof(*info));
@@ -1270,6 +1284,12 @@ int aac_get_adapter_info(struct aac_dev*
 
 		if (rcode >= 0)
 			memcpy(&dev->supplement_adapter_info, sinfo, sizeof(*sinfo));
+		if (rcode == -ERESTARTSYS) {
+			fibptr = aac_fib_alloc(dev);
+			if (!fibptr)
+				return -ENOMEM;
+		}
+
 	}
 
 
@@ -1470,9 +1490,11 @@ int aac_get_adapter_info(struct aac_dev*
 			  (dev->scsi_host_ptr->sg_tablesize * 8) + 112;
 		}
 	}
-
-	aac_fib_complete(fibptr);
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (rcode != -ERESTARTSYS) {
+		aac_fib_complete(fibptr);
+		aac_fib_free(fibptr);
+	}
 
 	return rcode;
 }
@@ -1633,6 +1655,7 @@ static int aac_read(struct scsi_cmnd * s
 	 *	Alocate and initialize a Fib
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
+		printk(KERN_WARNING "aac_read: fib allocation failed\n");
 		return -1;
 	}
 
@@ -1712,9 +1735,14 @@ static int aac_write(struct scsi_cmnd * 
 	 *	Allocate and initialize a Fib then setup a BlockWrite command
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
-		scsicmd->result = DID_ERROR << 16;
-		scsicmd->scsi_done(scsicmd);
-		return 0;
+		/* FIB temporarily unavailable,not catastrophic failure */
+
+		/* scsicmd->result = DID_ERROR << 16;
+		 * scsicmd->scsi_done(scsicmd);
+		 * return 0;
+		 */
+		printk(KERN_WARNING "aac_write: fib allocation failed\n");
+		return -1;
 	}
 
 	status = aac_adapter_write(cmd_fibcontext, scsicmd, lba, count, fua);
diff -purN a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
--- a/drivers/scsi/aacraid/aacraid.h	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/aacraid.h	2009-12-18 17:10:08.000000000 +0530
@@ -12,7 +12,7 @@
  *----------------------------------------------------------------------------*/
 
 #ifndef AAC_DRIVER_BUILD
-# define AAC_DRIVER_BUILD 2461
+# define AAC_DRIVER_BUILD 24702
 # define AAC_DRIVER_BRANCH "-ms"
 #endif
 #define MAXIMUM_NUM_CONTAINERS	32
@@ -1036,6 +1036,9 @@ struct aac_dev
 	u8			printf_enabled;
 	u8			in_reset;
 	u8			msi;
+	int			management_fib_count;
+	spinlock_t		manage_lock;
+
 };
 
 #define aac_adapter_interrupt(dev) \
diff -purN a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
--- a/drivers/scsi/aacraid/commctrl.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/commctrl.c	2009-12-18 17:14:37.000000000 +0530
@@ -153,7 +153,7 @@ cleanup:
 		fibptr->hw_fib_pa = hw_fib_pa;
 		fibptr->hw_fib_va = hw_fib;
 	}
-	if (retval != -EINTR)
+	if (retval != -ERESTARTSYS)
 		aac_fib_free(fibptr);
 	return retval;
 }
@@ -322,7 +322,7 @@ return_fib:
 		}
 		if (f.wait) {
 			if(down_interruptible(&fibctx->wait_sem) < 0) {
-				status = -EINTR;
+				status = -ERESTARTSYS;
 			} else {
 				/* Lock again and retry */
 				spin_lock_irqsave(&dev->fib_lock, flags);
@@ -593,10 +593,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -645,10 +645,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -695,10 +695,10 @@ static int aac_send_raw_srb(struct aac_d
 				uintptr_t addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -734,10 +734,10 @@ static int aac_send_raw_srb(struct aac_d
 				dma_addr_t addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -772,8 +772,8 @@ static int aac_send_raw_srb(struct aac_d
 		psg->count = cpu_to_le32(sg_indx+1);
 		status = aac_fib_send(ScsiPortCommand, srbfib, actual_fibsize, FsaNormal, 1, 1, NULL, NULL);
 	}
-	if (status == -EINTR) {
-		rcode = -EINTR;
+	if (status == -ERESTARTSYS) {
+		rcode = -ERESTARTSYS;
 		goto cleanup;
 	}
 
@@ -810,7 +810,7 @@ cleanup:
 	for(i=0; i <= sg_indx; i++){
 		kfree(sg_list[i]);
 	}
-	if (rcode != -EINTR) {
+	if (rcode != -ERESTARTSYS) {
 		aac_fib_complete(srbfib);
 		aac_fib_free(srbfib);
 	}
@@ -848,7 +848,7 @@ int aac_do_ioctl(struct aac_dev * dev, i
 	 */
 
 	status = aac_dev_ioctl(dev, cmd, arg);
-	if(status != -ENOTTY)
+	if (status != -ENOTTY)
 		return status;
 
 	switch (cmd) {
diff -purN a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
--- a/drivers/scsi/aacraid/comminit.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/comminit.c	2009-12-18 17:09:42.000000000 +0530
@@ -194,7 +194,9 @@ int aac_send_shutdown(struct aac_dev * d
 
 	if (status >= 0)
 		aac_fib_complete(fibctx);
-	aac_fib_free(fibctx);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibctx);
 	return status;
 }
 
@@ -304,6 +306,8 @@ struct aac_dev *aac_init_adapter(struct 
 	/*
 	 *	Check the preferred comm settings, defaults from template.
 	 */
+	dev->management_fib_count = 0;
+	spin_lock_init(&dev->manage_lock);
 	dev->max_fib_size = sizeof(struct hw_fib);
 	dev->sg_tablesize = host->sg_tablesize = (dev->max_fib_size
 		- sizeof(struct aac_fibhdr)
diff -purN a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
--- a/drivers/scsi/aacraid/commsup.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/commsup.c	2009-12-18 17:15:10.000000000 +0530
@@ -189,7 +189,14 @@ struct fib *aac_fib_alloc(struct aac_dev
 
 void aac_fib_free(struct fib *fibptr)
 {
-	unsigned long flags;
+	unsigned long flags, flagsv;
+
+	spin_lock_irqsave(&fibptr->event_lock, flagsv);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
+		return;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
 
 	spin_lock_irqsave(&fibptr->dev->fib_lock, flags);
 	if (unlikely(fibptr->flags & FIB_CONTEXT_FLAG_TIMED_OUT))
@@ -473,14 +480,33 @@ int aac_fib_send(u16 command, struct fib
 
 	if(wait)
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-	aac_adapter_deliver(fibptr);
+
+	if (aac_adapter_deliver(fibptr) != 0) {
+		printk(KERN_ERR "aac_fib_send: returned -EBUSY\n");
+		if (wait)
+			spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return -EBUSY;
+	}
+
 
 	/*
 	 *	If the caller wanted us to wait for response wait now.
 	 */
 
 	if (wait) {
+		unsigned long mflags;
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
+		spin_lock_irqsave(&dev->manage_lock, mflags);
+		if (dev->management_fib_count >= AAC_NUM_MGT_FIB) {
+			printk(KERN_INFO "No management Fibs Available:%d\n",
+						dev->management_fib_count);
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+			return -EBUSY;
+		}
+		dev->management_fib_count++;
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 		/* Only set for first known interruptable command */
 		if (wait < 0) {
 			/*
@@ -516,14 +542,15 @@ int aac_fib_send(u16 command, struct fib
 				udelay(5);
 			}
 		} else if (down_interruptible(&fibptr->event_wait)) {
-			fibptr->done = 2;
-			up(&fibptr->event_wait);
+			/* Do nothing ... satisfy
+			 * down_interruptible must_check */
 		}
+
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-		if ((fibptr->done == 0) || (fibptr->done == 2)) {
+		if (fibptr->done == 0) {
 			fibptr->done = 2; /* Tell interrupt we aborted */
 			spin_unlock_irqrestore(&fibptr->event_lock, flags);
-			return -EINTR;
+			return -ERESTARTSYS;
 		}
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
 		BUG_ON(fibptr->done == 0);
@@ -689,6 +716,7 @@ int aac_fib_adapter_complete(struct fib 
 
 int aac_fib_complete(struct fib *fibptr)
 {
+	unsigned long flags;
 	struct hw_fib * hw_fib = fibptr->hw_fib_va;
 
 	/*
@@ -709,6 +737,13 @@ int aac_fib_complete(struct fib *fibptr)
 	 *	command is complete that we had sent to the adapter and this
 	 *	cdb could be reused.
 	 */
+	spin_lock_irqsave(&fibptr->event_lock, flags);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return 0;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
 	if((hw_fib->header.XferState & cpu_to_le32(SentFromHost)) &&
 		(hw_fib->header.XferState & cpu_to_le32(AdapterProcessed)))
 	{
@@ -1355,7 +1390,10 @@ int aac_reset_adapter(struct aac_dev * a
 
 			if (status >= 0)
 				aac_fib_complete(fibctx);
-			aac_fib_free(fibctx);
+			/* FIB should be freed only after getting
+			 * the response from the F/W */
+			if (status != -ERESTARTSYS)
+				aac_fib_free(fibctx);
 		}
 	}
 
@@ -1759,6 +1797,7 @@ int aac_command_thread(void *data)
 				struct fib *fibptr;
 
 				if ((fibptr = aac_fib_alloc(dev))) {
+					int status;
 					__le32 *info;
 
 					aac_fib_init(fibptr);
@@ -1769,15 +1808,21 @@ int aac_command_thread(void *data)
 
 					*info = cpu_to_le32(now.tv_sec);
 
-					(void)aac_fib_send(SendHostTime,
+					status = aac_fib_send(SendHostTime,
 						fibptr,
 						sizeof(*info),
 						FsaNormal,
 						1, 1,
 						NULL,
 						NULL);
-					aac_fib_complete(fibptr);
-					aac_fib_free(fibptr);
+					/* Do not set XferState to zero unless
+					 * receives a response from F/W */
+					if (status >= 0)
+						aac_fib_complete(fibptr);
+					/* FIB should be freed only after
+					 * getting the response from the F/W */
+					if (status != -ERESTARTSYS)
+						aac_fib_free(fibptr);
 				}
 				difference = (long)(unsigned)update_interval*HZ;
 			} else {
diff -purN a/drivers/scsi/aacraid/dpcsup.c b/drivers/scsi/aacraid/dpcsup.c
--- a/drivers/scsi/aacraid/dpcsup.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/dpcsup.c	2009-12-18 17:09:42.000000000 +0530
@@ -57,9 +57,9 @@ unsigned int aac_response_normal(struct 
 	struct hw_fib * hwfib;
 	struct fib * fib;
 	int consumed = 0;
-	unsigned long flags;
+	unsigned long flags, mflags;
 
-	spin_lock_irqsave(q->lock, flags);	
+	spin_lock_irqsave(q->lock, flags);
 	/*
 	 *	Keep pulling response QEs off the response queue and waking
 	 *	up the waiters until there are no more QEs. We then return
@@ -125,12 +125,21 @@ unsigned int aac_response_normal(struct 
 		} else {
 			unsigned long flagv;
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
 			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
 				aac_fib_complete(fib);
 				aac_fib_free(fib);
 			}
@@ -232,6 +241,7 @@ unsigned int aac_command_normal(struct a
 
 unsigned int aac_intr_normal(struct aac_dev * dev, u32 index)
 {
+	unsigned long mflags;
 	dprintk((KERN_INFO "aac_intr_normal(%p,%x)\n", dev, index));
 	if ((index & 0x00000002L)) {
 		struct hw_fib * hw_fib;
@@ -320,11 +330,25 @@ unsigned int aac_intr_normal(struct aac_
 			unsigned long flagv;
 	  		dprintk((KERN_INFO "event_wait up\n"));
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
+			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
+				aac_fib_complete(fib);
+				aac_fib_free(fib);
+			}
+
 		}
 		return 0;
 	}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-16 15:51         ` James Bottomley
  2009-11-17  5:22           ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
@ 2009-11-17  5:27           ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  1 sibling, 0 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, ERS-HCLTech @ 2009-11-17  5:27 UTC (permalink / raw)
  To: James Bottomley
  Cc: 'linux-scsi@vger.kernel.org', James Bottomley, ServeRAID Driver

Hi James,

Please ignore the previous mail and the attached patch in that mail.

Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de]
Sent: Monday, November 16, 2009 9:21 PM
To: Penchala Narasimha Reddy Chilakala, ERS-HCLTech
Cc: 'linux-scsi@vger.kernel.org'; James Bottomley; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Mon, 2009-11-16 at 08:55 +0530, Penchala Narasimha Reddy Chilakala,
ERS-HCLTech wrote:
> Hi James,
>
> Thanks a lot for your nice review, feedback and support so far. I hope
> that we have sorted out all the issues raised by you except the racy
> issue.

Yes ... I think the race problem isn't particularly relevant to the
bugs, so send me a patch without the attempted ioctl fib counting and it
can go into -rc ... you can work on the fib counting later for
scsi-misc.

James



DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-16 15:51         ` James Bottomley
@ 2009-11-17  5:22           ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  2009-11-17  5:27           ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  1 sibling, 0 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, ERS-HCLTech @ 2009-11-17  5:22 UTC (permalink / raw)
  To: James Bottomley
  Cc: 'linux-scsi@vger.kernel.org', James Bottomley, ServeRAID Driver

[-- Attachment #1: Type: text/plain, Size: 1997 bytes --]

Hi James,

Please find the attached updated patch without the attempted ioctl fib counting and make sure the patch would come in the -rc. 

Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de] 
Sent: Monday, November 16, 2009 9:21 PM
To: Penchala Narasimha Reddy Chilakala, ERS-HCLTech
Cc: 'linux-scsi@vger.kernel.org'; James Bottomley; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Mon, 2009-11-16 at 08:55 +0530, Penchala Narasimha Reddy Chilakala,
ERS-HCLTech wrote:
> Hi James,
>
> Thanks a lot for your nice review, feedback and support so far. I hope
> that we have sorted out all the issues raised by you except the racy
> issue.

Yes ... I think the race problem isn't particularly relevant to the
bugs, so send me a patch without the attempted ioctl fib counting and it
can go into -rc ... you can work on the fib counting later for
scsi-misc.

James



DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

[-- Attachment #2: community_aac24701.email --]
[-- Type: application/octet-stream, Size: 693 bytes --]


This attached patch is against current scsi-misc-2.6.

ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments.

Signed-off-by: Penchala Narasimha Reddy <ServeRAIDDriver@hcl.in>

 drivers/scsi/aacraid/aachba.c   |   52 +++++++++++++++++++++++++++--------
 drivers/scsi/aacraid/aacraid.h  |    5 ++-
 drivers/scsi/aacraid/commctrl.c |   37 +++++++++++++++----------
 drivers/scsi/aacraid/comminit.c |    6 +++-
 drivers/scsi/aacraid/commsup.c  |   59 +++++++++++++++++++++++++++++++++-------
 drivers/scsi/aacraid/dpcsup.c   |   36 ++++++++++++++++++++----
 6 files changed, 151 insertions(+), 44 deletions(-)

Sincerely - Penchala Narasimha Reddy

[-- Attachment #3: community_aac24701.patch --]
[-- Type: application/octet-stream, Size: 14706 bytes --]

diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
--- a/drivers/scsi/aacraid/aachba.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/aachba.c	2009-09-25 11:43:12.000000000 +0530
@@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
 			status = -EINVAL;
 		}
 	}
-	aac_fib_complete(fibptr);
+	/* Do not set XferState to zero unless receives a response from F/W */
+	if (status >= 0)
+		aac_fib_complete(fibptr);
+
 	/* Send a CT_COMMIT_CONFIG to enable discovery of devices */
 	if (status >= 0) {
 		if ((aac_commit == 1) || commit_flag) {
@@ -310,13 +313,18 @@ int aac_get_config_status(struct aac_dev
 				    FsaNormal,
 				    1, 1,
 				    NULL, NULL);
-			aac_fib_complete(fibptr);
+			/* Do not set XferState to zero unless
+			 * receives a response from F/W */
+			if (status >= 0)
+				aac_fib_complete(fibptr);
 		} else if (aac_commit == 0) {
 			printk(KERN_WARNING
 			  "aac_get_config_status: Foreign device configurations are being ignored\n");
 		}
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 	return status;
 }
 
@@ -355,7 +363,9 @@ int aac_get_containers(struct aac_dev *d
 		maximum_num_containers = le32_to_cpu(dresp->ContainerSwitchEntries);
 		aac_fib_complete(fibptr);
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 
 	if (maximum_num_containers < MAXIMUM_NUM_CONTAINERS)
 		maximum_num_containers = MAXIMUM_NUM_CONTAINERS;
@@ -1245,8 +1255,12 @@ int aac_get_adapter_info(struct aac_dev*
 			 NULL);
 
 	if (rcode < 0) {
-		aac_fib_complete(fibptr);
-		aac_fib_free(fibptr);
+		/* FIB should be freed only after
+		 * getting the response from the F/W */
+		if (rcode != -ERESTARTSYS) {
+			aac_fib_complete(fibptr);
+			aac_fib_free(fibptr);
+		}
 		return rcode;
 	}
 	memcpy(&dev->adapter_info, info, sizeof(*info));
@@ -1270,6 +1284,12 @@ int aac_get_adapter_info(struct aac_dev*
 
 		if (rcode >= 0)
 			memcpy(&dev->supplement_adapter_info, sinfo, sizeof(*sinfo));
+		if (rcode == -ERESTARTSYS) {
+			fibptr = aac_fib_alloc(dev);
+			if (!fibptr)
+				return -ENOMEM;
+		}
+
 	}
 
 
@@ -1470,9 +1490,11 @@ int aac_get_adapter_info(struct aac_dev*
 			  (dev->scsi_host_ptr->sg_tablesize * 8) + 112;
 		}
 	}
-
-	aac_fib_complete(fibptr);
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (rcode != -ERESTARTSYS) {
+		aac_fib_complete(fibptr);
+		aac_fib_free(fibptr);
+	}
 
 	return rcode;
 }
@@ -1633,6 +1655,7 @@ static int aac_read(struct scsi_cmnd * s
 	 *	Alocate and initialize a Fib
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
+		printk(KERN_WARNING "aac_read: fib allocation failed\n");
 		return -1;
 	}
 
@@ -1712,9 +1735,14 @@ static int aac_write(struct scsi_cmnd * 
 	 *	Allocate and initialize a Fib then setup a BlockWrite command
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
-		scsicmd->result = DID_ERROR << 16;
-		scsicmd->scsi_done(scsicmd);
-		return 0;
+		/* FIB temporarily unavailable,not catastrophic failure */
+
+		/* scsicmd->result = DID_ERROR << 16;
+		 * scsicmd->scsi_done(scsicmd);
+		 * return 0;
+		 */
+		printk(KERN_WARNING "aac_write: fib allocation failed\n");
+		return -1;
 	}
 
 	status = aac_adapter_write(cmd_fibcontext, scsicmd, lba, count, fua);
diff -purN a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
--- a/drivers/scsi/aacraid/aacraid.h	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/aacraid.h	2009-09-25 12:13:55.000000000 +0530
@@ -12,7 +12,7 @@
  *----------------------------------------------------------------------------*/
 
 #ifndef AAC_DRIVER_BUILD
-# define AAC_DRIVER_BUILD 2461
+# define AAC_DRIVER_BUILD 24701
 # define AAC_DRIVER_BRANCH "-ms"
 #endif
 #define MAXIMUM_NUM_CONTAINERS	32
@@ -1036,6 +1036,9 @@ struct aac_dev
 	u8			printf_enabled;
 	u8			in_reset;
 	u8			msi;
+	int			management_fib_count;
+	spinlock_t		manage_lock;
+
 };
 
 #define aac_adapter_interrupt(dev) \
diff -purN a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
--- a/drivers/scsi/aacraid/commctrl.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/commctrl.c	2009-09-25 12:07:44.000000000 +0530
@@ -153,7 +153,7 @@ cleanup:
 		fibptr->hw_fib_pa = hw_fib_pa;
 		fibptr->hw_fib_va = hw_fib;
 	}
-	if (retval != -EINTR)
+	if (retval != -ERESTARTSYS)
 		aac_fib_free(fibptr);
 	return retval;
 }
@@ -322,7 +322,7 @@ return_fib:
 		}
 		if (f.wait) {
 			if(down_interruptible(&fibctx->wait_sem) < 0) {
-				status = -EINTR;
+				status = -ERESTARTSYS;
 			} else {
 				/* Lock again and retry */
 				spin_lock_irqsave(&dev->fib_lock, flags);
@@ -593,10 +593,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -645,10 +645,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -695,10 +695,10 @@ static int aac_send_raw_srb(struct aac_d
 				uintptr_t addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -734,10 +734,10 @@ static int aac_send_raw_srb(struct aac_d
 				dma_addr_t addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -772,8 +772,8 @@ static int aac_send_raw_srb(struct aac_d
 		psg->count = cpu_to_le32(sg_indx+1);
 		status = aac_fib_send(ScsiPortCommand, srbfib, actual_fibsize, FsaNormal, 1, 1, NULL, NULL);
 	}
-	if (status == -EINTR) {
-		rcode = -EINTR;
+	if (status == -ERESTARTSYS) {
+		rcode = -ERESTARTSYS;
 		goto cleanup;
 	}
 
@@ -810,7 +810,7 @@ cleanup:
 	for(i=0; i <= sg_indx; i++){
 		kfree(sg_list[i]);
 	}
-	if (rcode != -EINTR) {
+	if (rcode != -ERESTARTSYS) {
 		aac_fib_complete(srbfib);
 		aac_fib_free(srbfib);
 	}
@@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
 int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
 {
 	int status;
+	unsigned long mflags;
 
 	/*
 	 *	HBA gets first crack
 	 */
 
+	spin_lock_irqsave(&dev->manage_lock, mflags);
+	if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
+		printk(KERN_INFO "No management Fibs Available:%d\n",
+						dev->management_fib_count);
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
+		return -EBUSY;
+	}
+	spin_unlock_irqrestore(&dev->manage_lock, mflags);
 	status = aac_dev_ioctl(dev, cmd, arg);
-	if(status != -ENOTTY)
+	if (status != -ENOTTY)
 		return status;
 
 	switch (cmd) {
diff -purN a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
--- a/drivers/scsi/aacraid/comminit.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/comminit.c	2009-09-25 11:54:23.000000000 +0530
@@ -194,7 +194,9 @@ int aac_send_shutdown(struct aac_dev * d
 
 	if (status >= 0)
 		aac_fib_complete(fibctx);
-	aac_fib_free(fibctx);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibctx);
 	return status;
 }
 
@@ -304,6 +306,8 @@ struct aac_dev *aac_init_adapter(struct 
 	/*
 	 *	Check the preferred comm settings, defaults from template.
 	 */
+	dev->management_fib_count = 0;
+	spin_lock_init(&dev->manage_lock);
 	dev->max_fib_size = sizeof(struct hw_fib);
 	dev->sg_tablesize = host->sg_tablesize = (dev->max_fib_size
 		- sizeof(struct aac_fibhdr)
diff -purN a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
--- a/drivers/scsi/aacraid/commsup.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/commsup.c	2009-11-05 16:49:13.000000000 +0530
@@ -189,7 +189,14 @@ struct fib *aac_fib_alloc(struct aac_dev
 
 void aac_fib_free(struct fib *fibptr)
 {
-	unsigned long flags;
+	unsigned long flags, flagsv;
+
+	spin_lock_irqsave(&fibptr->event_lock, flagsv);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
+		return;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
 
 	spin_lock_irqsave(&fibptr->dev->fib_lock, flags);
 	if (unlikely(fibptr->flags & FIB_CONTEXT_FLAG_TIMED_OUT))
@@ -473,14 +480,27 @@ int aac_fib_send(u16 command, struct fib
 
 	if(wait)
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-	aac_adapter_deliver(fibptr);
+
+	if (aac_adapter_deliver(fibptr) != 0) {
+		printk(KERN_ERR "aac_fib_send: returned -EBUSY\n");
+		if (wait)
+			spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return -EBUSY;
+	}
+
 
 	/*
 	 *	If the caller wanted us to wait for response wait now.
 	 */
 
 	if (wait) {
+		unsigned long mflags;
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
+		spin_lock_irqsave(&dev->manage_lock, mflags);
+		dev->management_fib_count++;
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 		/* Only set for first known interruptable command */
 		if (wait < 0) {
 			/*
@@ -516,14 +536,15 @@ int aac_fib_send(u16 command, struct fib
 				udelay(5);
 			}
 		} else if (down_interruptible(&fibptr->event_wait)) {
-			fibptr->done = 2;
-			up(&fibptr->event_wait);
+			/* Do nothing ... satisfy
+			 * down_interruptible must_check */
 		}
+
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-		if ((fibptr->done == 0) || (fibptr->done == 2)) {
+		if (fibptr->done == 0) {
 			fibptr->done = 2; /* Tell interrupt we aborted */
 			spin_unlock_irqrestore(&fibptr->event_lock, flags);
-			return -EINTR;
+			return -ERESTARTSYS;
 		}
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
 		BUG_ON(fibptr->done == 0);
@@ -689,6 +710,7 @@ int aac_fib_adapter_complete(struct fib 
 
 int aac_fib_complete(struct fib *fibptr)
 {
+	unsigned long flags;
 	struct hw_fib * hw_fib = fibptr->hw_fib_va;
 
 	/*
@@ -709,6 +731,13 @@ int aac_fib_complete(struct fib *fibptr)
 	 *	command is complete that we had sent to the adapter and this
 	 *	cdb could be reused.
 	 */
+	spin_lock_irqsave(&fibptr->event_lock, flags);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return 0;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
 	if((hw_fib->header.XferState & cpu_to_le32(SentFromHost)) &&
 		(hw_fib->header.XferState & cpu_to_le32(AdapterProcessed)))
 	{
@@ -1355,7 +1384,10 @@ int aac_reset_adapter(struct aac_dev * a
 
 			if (status >= 0)
 				aac_fib_complete(fibctx);
-			aac_fib_free(fibctx);
+			/* FIB should be freed only after getting
+			 * the response from the F/W */
+			if (status != -ERESTARTSYS)
+				aac_fib_free(fibctx);
 		}
 	}
 
@@ -1759,6 +1791,7 @@ int aac_command_thread(void *data)
 				struct fib *fibptr;
 
 				if ((fibptr = aac_fib_alloc(dev))) {
+					int status;
 					__le32 *info;
 
 					aac_fib_init(fibptr);
@@ -1769,15 +1802,21 @@ int aac_command_thread(void *data)
 
 					*info = cpu_to_le32(now.tv_sec);
 
-					(void)aac_fib_send(SendHostTime,
+					status = aac_fib_send(SendHostTime,
 						fibptr,
 						sizeof(*info),
 						FsaNormal,
 						1, 1,
 						NULL,
 						NULL);
-					aac_fib_complete(fibptr);
-					aac_fib_free(fibptr);
+					/* Do not set XferState to zero unless
+					 * receives a response from F/W */
+					if (status >= 0)
+						aac_fib_complete(fibptr);
+					/* FIB should be freed only after
+					 * getting the response from the F/W */
+					if (status != -ERESTARTSYS)
+						aac_fib_free(fibptr);
 				}
 				difference = (long)(unsigned)update_interval*HZ;
 			} else {
diff -purN a/drivers/scsi/aacraid/dpcsup.c b/drivers/scsi/aacraid/dpcsup.c
--- a/drivers/scsi/aacraid/dpcsup.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/dpcsup.c	2009-09-25 12:08:44.000000000 +0530
@@ -57,9 +57,9 @@ unsigned int aac_response_normal(struct 
 	struct hw_fib * hwfib;
 	struct fib * fib;
 	int consumed = 0;
-	unsigned long flags;
+	unsigned long flags, mflags;
 
-	spin_lock_irqsave(q->lock, flags);	
+	spin_lock_irqsave(q->lock, flags);
 	/*
 	 *	Keep pulling response QEs off the response queue and waking
 	 *	up the waiters until there are no more QEs. We then return
@@ -125,12 +125,21 @@ unsigned int aac_response_normal(struct 
 		} else {
 			unsigned long flagv;
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
 			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
 				aac_fib_complete(fib);
 				aac_fib_free(fib);
 			}
@@ -232,6 +241,7 @@ unsigned int aac_command_normal(struct a
 
 unsigned int aac_intr_normal(struct aac_dev * dev, u32 index)
 {
+	unsigned long mflags;
 	dprintk((KERN_INFO "aac_intr_normal(%p,%x)\n", dev, index));
 	if ((index & 0x00000002L)) {
 		struct hw_fib * hw_fib;
@@ -320,11 +330,25 @@ unsigned int aac_intr_normal(struct aac_
 			unsigned long flagv;
 	  		dprintk((KERN_INFO "event_wait up\n"));
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
+			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
+				aac_fib_complete(fib);
+				aac_fib_free(fib);
+			}
+
 		}
 		return 0;
 	}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-16  3:25       ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
@ 2009-11-16 15:51         ` James Bottomley
  2009-11-17  5:22           ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  2009-11-17  5:27           ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  0 siblings, 2 replies; 40+ messages in thread
From: James Bottomley @ 2009-11-16 15:51 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  Cc: 'linux-scsi@vger.kernel.org', James Bottomley, ServeRAID Driver

On Mon, 2009-11-16 at 08:55 +0530, Penchala Narasimha Reddy Chilakala,
ERS-HCLTech wrote:
> Hi James,
> 
> Thanks a lot for your nice review, feedback and support so far. I hope
> that we have sorted out all the issues raised by you except the racy
> issue.

Yes ... I think the race problem isn't particularly relevant to the
bugs, so send me a patch without the attempted ioctl fib counting and it
can go into -rc ... you can work on the fib counting later for
scsi-misc.

James



^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-13 20:55     ` James Bottomley
@ 2009-11-16  3:25       ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  2009-11-16 15:51         ` James Bottomley
  0 siblings, 1 reply; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, ERS-HCLTech @ 2009-11-16  3:25 UTC (permalink / raw)
  To: James Bottomley
  Cc: 'linux-scsi@vger.kernel.org', James Bottomley, ServeRAID Driver

Hi James,

Thanks a lot for your nice review, feedback and support so far. I hope that we have sorted out all the issues raised by you except the racy issue.

Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de] 
Sent: Saturday, November 14, 2009 2:25 AM
To: Penchala Narasimha Reddy Chilakala, ERS-HCLTech
Cc: 'linux-scsi@vger.kernel.org'; James Bottomley; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Thu, 2009-11-12 at 13:58 +0530, Penchala Narasimha Reddy Chilakala,
ERS-HCLTech wrote:
> Hi James,
>
> Thank you very much for your suggestion.
>
> I feel that we need to work out for a better solution which would not
> create any issues. We have to go ahead with the patch as it is tested
> and working as expected (with out any issues) even though we are
> thinking there might be some racy issue until we come out with a
> better solution than the current one

Sure, I can wait for a better solution.

> I am not quite ok with the fix because of the following reasons:
>
>         1. I feel that calling the dev->management_fib_count--
> statement in one place, that is in dpcsup.c (Whenever interrupt comes
> from arcraid controller, the corresponding ISR route will be called
> and the value will be decremented, if the response is for the
> requested management fib) is good  instead of calling the statement in
> two places, which are dpcsup.c and commctrl.c.

Um, so that's the same mechanism you implemented for this undone fib
problem: setting the ->done flag as mediator.

>      2. Moving the dev->management_fib_count++ from aac_fib_send ()
> function to aac_do_ioctl () function did not work well during our
> testing (This was suggested by you in one of the earlier mails).

That doesn't give me a lot to go on.

However, I'm not particularly bothered whose solution it is, just that
we have a solution that isn't demonstrably racy.

James



DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-12  8:28   ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
@ 2009-11-13 20:55     ` James Bottomley
  2009-11-16  3:25       ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  0 siblings, 1 reply; 40+ messages in thread
From: James Bottomley @ 2009-11-13 20:55 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  Cc: 'linux-scsi@vger.kernel.org', James Bottomley, ServeRAID Driver

On Thu, 2009-11-12 at 13:58 +0530, Penchala Narasimha Reddy Chilakala,
ERS-HCLTech wrote:
> Hi James,
> 
> Thank you very much for your suggestion.
> 
> I feel that we need to work out for a better solution which would not
> create any issues. We have to go ahead with the patch as it is tested
> and working as expected (with out any issues) even though we are
> thinking there might be some racy issue until we come out with a
> better solution than the current one

Sure, I can wait for a better solution.

> I am not quite ok with the fix because of the following reasons:
> 
>         1. I feel that calling the dev->management_fib_count--
> statement in one place, that is in dpcsup.c (Whenever interrupt comes
> from arcraid controller, the corresponding ISR route will be called
> and the value will be decremented, if the response is for the
> requested management fib) is good  instead of calling the statement in
> two places, which are dpcsup.c and commctrl.c.

Um, so that's the same mechanism you implemented for this undone fib
problem: setting the ->done flag as mediator.

>      2. Moving the dev->management_fib_count++ from aac_fib_send ()
> function to aac_do_ioctl () function did not work well during our
> testing (This was suggested by you in one of the earlier mails).

That doesn't give me a lot to go on.

However, I'm not particularly bothered whose solution it is, just that
we have a solution that isn't demonstrably racy.

James



^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-11 19:26 ` James Bottomley
@ 2009-11-12  8:28   ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  2009-11-13 20:55     ` James Bottomley
  0 siblings, 1 reply; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, ERS-HCLTech @ 2009-11-12  8:28 UTC (permalink / raw)
  To: James Bottomley
  Cc: 'linux-scsi@vger.kernel.org', James Bottomley, ServeRAID Driver

Hi James,

Thank you very much for your suggestion.

I feel that we need to work out for a better solution which would not create any issues. We have to go ahead with the patch as it is tested and working as expected (with out any issues) even though we are thinking there might be some racy issue until we come out with a better solution than the current one

I am not quite ok with the fix because of the following reasons:

        1. I feel that calling the dev->management_fib_count-- statement in one place, that is in dpcsup.c (Whenever interrupt comes from arcraid controller, the corresponding ISR route will be called and the value will be decremented, if the response is for the requested management fib) is good  instead of calling the statement in two places, which are dpcsup.c and commctrl.c.

     2. Moving the dev->management_fib_count++ from aac_fib_send () function to aac_do_ioctl () function did not work well during our testing (This was suggested by you in one of the earlier mails).


Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de]
Sent: Thursday, November 12, 2009 12:57 AM
To: Penchala Narasimha Reddy Chilakala, ERS-HCLTech
Cc: 'linux-scsi@vger.kernel.org'; James Bottomley; ServeRAID Driver
Subject: Re: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Thu, 2009-11-05 at 18:57 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Hi James and linux-scsi community,
>
>        I would request you that please review the attached updated patch with the following modification, which is suggested by James in the previous mails and do the needful to incorporate the patch in up-coming release.
>
>
> else if (down_interruptible(&fibptr->event_wait)) {
>         /* Do nothing ... satisfy
>          * down_interruptible must_check */
>  }
>

Yes, that's fine.

The management fib stuff is still racy as I pointed out previously. The
attached should fix it up, is this OK with you?


James

---

diff --git a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
index b1fc0a0..795fb05 100644
--- a/drivers/scsi/aacraid/commctrl.c
+++ b/drivers/scsi/aacraid/commctrl.c
@@ -855,10 +855,11 @@ int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
                spin_unlock_irqrestore(&dev->manage_lock, mflags);
                return -EBUSY;
        }
+       dev->management_fib_count++;
        spin_unlock_irqrestore(&dev->manage_lock, mflags);
        status = aac_dev_ioctl(dev, cmd, arg);
        if (status != -ENOTTY)
-               return status;
+               goto out;

        switch (cmd) {
        case FSACTL_MINIPORT_REV_CHECK:
@@ -887,6 +888,13 @@ int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
                status = -ENOTTY;
                break;
        }
+ out:
+       if (status != -ERESTARTSYS) {
+               spin_lock_irqsave(&dev->manage_lock, mflags);
+               --dev->management_fib_count;
+               spin_unlock_irqrestore(&dev->manage_lock, mflags);
+       }
+
        return status;
 }

diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index 99729a6..e00c6a9 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -494,13 +494,8 @@ int aac_fib_send(u16 command, struct fib *fibptr, unsigned long size,
         */

        if (wait) {
-               unsigned long mflags;
                spin_unlock_irqrestore(&fibptr->event_lock, flags);

-               spin_lock_irqsave(&dev->manage_lock, mflags);
-               dev->management_fib_count++;
-               spin_unlock_irqrestore(&dev->manage_lock, mflags);
-
                /* Only set for first known interruptable command */
                if (wait < 0) {
                        /*
diff --git a/drivers/scsi/aacraid/dpcsup.c b/drivers/scsi/aacraid/dpcsup.c
index 9c7408f..d06bfd2 100644
--- a/drivers/scsi/aacraid/dpcsup.c
+++ b/drivers/scsi/aacraid/dpcsup.c
@@ -57,7 +57,7 @@ unsigned int aac_response_normal(struct aac_queue * q)
        struct hw_fib * hwfib;
        struct fib * fib;
        int consumed = 0;
-       unsigned long flags, mflags;
+       unsigned long flags;

        spin_lock_irqsave(q->lock, flags);
        /*
@@ -131,13 +131,13 @@ unsigned int aac_response_normal(struct aac_queue * q)
                        }
                        spin_unlock_irqrestore(&fib->event_lock, flagv);

-                       spin_lock_irqsave(&dev->manage_lock, mflags);
-                       dev->management_fib_count--;
-                       spin_unlock_irqrestore(&dev->manage_lock, mflags);
-
                        FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
                        if (fib->done == 2) {
-                               spin_lock_irqsave(&fib->event_lock, flagv);
+                               spin_lock_irqsave(&dev->manage_lock, flagv);
+                               dev->management_fib_count--;
+                               spin_unlock(&dev->manage_lock);
+
+                               spin_lock(&fib->event_lock);
                                fib->done = 0;
                                spin_unlock_irqrestore(&fib->event_lock, flagv);
                                aac_fib_complete(fib);
@@ -241,7 +241,6 @@ unsigned int aac_command_normal(struct aac_queue *q)

 unsigned int aac_intr_normal(struct aac_dev * dev, u32 index)
 {
-       unsigned long mflags;
        dprintk((KERN_INFO "aac_intr_normal(%p,%x)\n", dev, index));
        if ((index & 0x00000002L)) {
                struct hw_fib * hw_fib;
@@ -336,13 +335,13 @@ unsigned int aac_intr_normal(struct aac_dev * dev, u32 index)
                        }
                        spin_unlock_irqrestore(&fib->event_lock, flagv);

-                       spin_lock_irqsave(&dev->manage_lock, mflags);
-                       dev->management_fib_count--;
-                       spin_unlock_irqrestore(&dev->manage_lock, mflags);
-
                        FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
                        if (fib->done == 2) {
-                               spin_lock_irqsave(&fib->event_lock, flagv);
+                               spin_lock_irqsave(&dev->manage_lock, flagv);
+                               dev->management_fib_count--;
+                               spin_unlock(&dev->manage_lock);
+
+                               spin_lock(&fib->event_lock);
                                fib->done = 0;
                                spin_unlock_irqrestore(&fib->event_lock, flagv);
                                aac_fib_complete(fib);




DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-05 13:27 Penchala Narasimha Reddy Chilakala, TLS-Chennai
@ 2009-11-11 19:26 ` James Bottomley
  2009-11-12  8:28   ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  0 siblings, 1 reply; 40+ messages in thread
From: James Bottomley @ 2009-11-11 19:26 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
  Cc: 'linux-scsi@vger.kernel.org', James Bottomley, ServeRAID Driver

On Thu, 2009-11-05 at 18:57 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Hi James and linux-scsi community,
> 
>        I would request you that please review the attached updated patch with the following modification, which is suggested by James in the previous mails and do the needful to incorporate the patch in up-coming release.
> 
> 
> else if (down_interruptible(&fibptr->event_wait)) {
>         /* Do nothing ... satisfy
>          * down_interruptible must_check */
>  }
> 

Yes, that's fine.

The management fib stuff is still racy as I pointed out previously. The
attached should fix it up, is this OK with you?

James

---

diff --git a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
index b1fc0a0..795fb05 100644
--- a/drivers/scsi/aacraid/commctrl.c
+++ b/drivers/scsi/aacraid/commctrl.c
@@ -855,10 +855,11 @@ int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
 		spin_unlock_irqrestore(&dev->manage_lock, mflags);
 		return -EBUSY;
 	}
+	dev->management_fib_count++;
 	spin_unlock_irqrestore(&dev->manage_lock, mflags);
 	status = aac_dev_ioctl(dev, cmd, arg);
 	if (status != -ENOTTY)
-		return status;
+		goto out;
 
 	switch (cmd) {
 	case FSACTL_MINIPORT_REV_CHECK:
@@ -887,6 +888,13 @@ int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
 		status = -ENOTTY;
 		break;
 	}
+ out:
+	if (status != -ERESTARTSYS) {
+		spin_lock_irqsave(&dev->manage_lock, mflags);
+		--dev->management_fib_count;
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
+	}
+		
 	return status;
 }
 
diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index 99729a6..e00c6a9 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -494,13 +494,8 @@ int aac_fib_send(u16 command, struct fib *fibptr, unsigned long size,
 	 */
 
 	if (wait) {
-		unsigned long mflags;
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
 
-		spin_lock_irqsave(&dev->manage_lock, mflags);
-		dev->management_fib_count++;
-		spin_unlock_irqrestore(&dev->manage_lock, mflags);
-
 		/* Only set for first known interruptable command */
 		if (wait < 0) {
 			/*
diff --git a/drivers/scsi/aacraid/dpcsup.c b/drivers/scsi/aacraid/dpcsup.c
index 9c7408f..d06bfd2 100644
--- a/drivers/scsi/aacraid/dpcsup.c
+++ b/drivers/scsi/aacraid/dpcsup.c
@@ -57,7 +57,7 @@ unsigned int aac_response_normal(struct aac_queue * q)
 	struct hw_fib * hwfib;
 	struct fib * fib;
 	int consumed = 0;
-	unsigned long flags, mflags;
+	unsigned long flags;
 
 	spin_lock_irqsave(q->lock, flags);
 	/*
@@ -131,13 +131,13 @@ unsigned int aac_response_normal(struct aac_queue * q)
 			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
 
-			spin_lock_irqsave(&dev->manage_lock, mflags);
-			dev->management_fib_count--;
-			spin_unlock_irqrestore(&dev->manage_lock, mflags);
-
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
 			if (fib->done == 2) {
-				spin_lock_irqsave(&fib->event_lock, flagv);
+				spin_lock_irqsave(&dev->manage_lock, flagv);
+				dev->management_fib_count--;
+				spin_unlock(&dev->manage_lock);
+
+				spin_lock(&fib->event_lock);
 				fib->done = 0;
 				spin_unlock_irqrestore(&fib->event_lock, flagv);
 				aac_fib_complete(fib);
@@ -241,7 +241,6 @@ unsigned int aac_command_normal(struct aac_queue *q)
 
 unsigned int aac_intr_normal(struct aac_dev * dev, u32 index)
 {
-	unsigned long mflags;
 	dprintk((KERN_INFO "aac_intr_normal(%p,%x)\n", dev, index));
 	if ((index & 0x00000002L)) {
 		struct hw_fib * hw_fib;
@@ -336,13 +335,13 @@ unsigned int aac_intr_normal(struct aac_dev * dev, u32 index)
 			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
 
-			spin_lock_irqsave(&dev->manage_lock, mflags);
-			dev->management_fib_count--;
-			spin_unlock_irqrestore(&dev->manage_lock, mflags);
-
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
 			if (fib->done == 2) {
-				spin_lock_irqsave(&fib->event_lock, flagv);
+				spin_lock_irqsave(&dev->manage_lock, flagv);
+				dev->management_fib_count--;
+				spin_unlock(&dev->manage_lock);
+
+				spin_lock(&fib->event_lock);
 				fib->done = 0;
 				spin_unlock_irqrestore(&fib->event_lock, flagv);
 				aac_fib_complete(fib);




^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
@ 2009-11-05 13:27 Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-11 19:26 ` James Bottomley
  0 siblings, 1 reply; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-11-05 13:27 UTC (permalink / raw)
  To: 'linux-scsi, James Bottomley; +Cc: ServeRAID Driver

[-- Attachment #1: Type: text/plain, Size: 3797 bytes --]

Hi James and linux-scsi community,

       I would request you that please review the attached updated patch with the following modification, which is suggested by James in the previous mails and do the needful to incorporate the patch in up-coming release.


else if (down_interruptible(&fibptr->event_wait)) {
        /* Do nothing ... satisfy
         * down_interruptible must_check */
 }


From: Narasimha Reddy <ServeRAIDDriver@hcl.in>

        The attached patch was generated for fixing the following issues. These issues were reported by Cisco, SAP and some other customers as well.

Regarding the testing of fixes:
--------------------------------

        These particular problems were reported by Cisco and SAP and customers as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build to IBM and Cisco. Cisco and IBM tested it for more than 15 days and they reported that they did not see the issue so far. Before the fix, Cisco used to see the issue within 5 days. We generated a patch for SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the patch and gave a test build to SAP. SAP tested and reported that the build is working properly.

We also tested in our lab using the tools "dishogsync", which is IO stress tool and the tool was provided by Cisco.

Issue1: File System going into read-only mode
---------

Root cause:
-----------
       The driver tends to not free the memory (FIB) when the management request exits prematurely. The accumulation of such un-freed memory causes the driver to fail to allocate anymore memory (FIB) and hence return 0x70000 value to the upper layer, which puts the file system into read only mode.

Fix details:
------------
     The fix makes sure to free the memory (FIB) even if the request exits prematurely hence ensuring the driver wouldn't run out of memory (FIBs).


Issue2:
-------
        False Raid Alert occurs- when the Physical Drives and Logical drives are reported as deleted or added, even though there is no change done on the system

Root cause:
-----------
        Driver IOCTLs is signaled with EINTR while waiting on response from the lower layers. Returning "EINTR" will never initiate internal retry.

Fix details:
------------
        The issue was fixed by replacing "EINTR" with "ERESTARTSYS" for mid-layer retries.


Signed-off-by: Narasimha Reddy <ServeRAIDDriver@hcl.in>

       Please find the attached patch file "community_aac24701.patch", which has been generated for the issues we fixed and "email" file "community_aac24701.email", which has been generated for the statistics like number of lines inserted and deleted, errors etc.

Regards - Narasimha Reddy


DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

[-- Attachment #2: community_aac24701.patch --]
[-- Type: application/octet-stream, Size: 14706 bytes --]

diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
--- a/drivers/scsi/aacraid/aachba.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/aachba.c	2009-09-25 11:43:12.000000000 +0530
@@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
 			status = -EINVAL;
 		}
 	}
-	aac_fib_complete(fibptr);
+	/* Do not set XferState to zero unless receives a response from F/W */
+	if (status >= 0)
+		aac_fib_complete(fibptr);
+
 	/* Send a CT_COMMIT_CONFIG to enable discovery of devices */
 	if (status >= 0) {
 		if ((aac_commit == 1) || commit_flag) {
@@ -310,13 +313,18 @@ int aac_get_config_status(struct aac_dev
 				    FsaNormal,
 				    1, 1,
 				    NULL, NULL);
-			aac_fib_complete(fibptr);
+			/* Do not set XferState to zero unless
+			 * receives a response from F/W */
+			if (status >= 0)
+				aac_fib_complete(fibptr);
 		} else if (aac_commit == 0) {
 			printk(KERN_WARNING
 			  "aac_get_config_status: Foreign device configurations are being ignored\n");
 		}
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 	return status;
 }
 
@@ -355,7 +363,9 @@ int aac_get_containers(struct aac_dev *d
 		maximum_num_containers = le32_to_cpu(dresp->ContainerSwitchEntries);
 		aac_fib_complete(fibptr);
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 
 	if (maximum_num_containers < MAXIMUM_NUM_CONTAINERS)
 		maximum_num_containers = MAXIMUM_NUM_CONTAINERS;
@@ -1245,8 +1255,12 @@ int aac_get_adapter_info(struct aac_dev*
 			 NULL);
 
 	if (rcode < 0) {
-		aac_fib_complete(fibptr);
-		aac_fib_free(fibptr);
+		/* FIB should be freed only after
+		 * getting the response from the F/W */
+		if (rcode != -ERESTARTSYS) {
+			aac_fib_complete(fibptr);
+			aac_fib_free(fibptr);
+		}
 		return rcode;
 	}
 	memcpy(&dev->adapter_info, info, sizeof(*info));
@@ -1270,6 +1284,12 @@ int aac_get_adapter_info(struct aac_dev*
 
 		if (rcode >= 0)
 			memcpy(&dev->supplement_adapter_info, sinfo, sizeof(*sinfo));
+		if (rcode == -ERESTARTSYS) {
+			fibptr = aac_fib_alloc(dev);
+			if (!fibptr)
+				return -ENOMEM;
+		}
+
 	}
 
 
@@ -1470,9 +1490,11 @@ int aac_get_adapter_info(struct aac_dev*
 			  (dev->scsi_host_ptr->sg_tablesize * 8) + 112;
 		}
 	}
-
-	aac_fib_complete(fibptr);
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (rcode != -ERESTARTSYS) {
+		aac_fib_complete(fibptr);
+		aac_fib_free(fibptr);
+	}
 
 	return rcode;
 }
@@ -1633,6 +1655,7 @@ static int aac_read(struct scsi_cmnd * s
 	 *	Alocate and initialize a Fib
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
+		printk(KERN_WARNING "aac_read: fib allocation failed\n");
 		return -1;
 	}
 
@@ -1712,9 +1735,14 @@ static int aac_write(struct scsi_cmnd * 
 	 *	Allocate and initialize a Fib then setup a BlockWrite command
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
-		scsicmd->result = DID_ERROR << 16;
-		scsicmd->scsi_done(scsicmd);
-		return 0;
+		/* FIB temporarily unavailable,not catastrophic failure */
+
+		/* scsicmd->result = DID_ERROR << 16;
+		 * scsicmd->scsi_done(scsicmd);
+		 * return 0;
+		 */
+		printk(KERN_WARNING "aac_write: fib allocation failed\n");
+		return -1;
 	}
 
 	status = aac_adapter_write(cmd_fibcontext, scsicmd, lba, count, fua);
diff -purN a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
--- a/drivers/scsi/aacraid/aacraid.h	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/aacraid.h	2009-09-25 12:13:55.000000000 +0530
@@ -12,7 +12,7 @@
  *----------------------------------------------------------------------------*/
 
 #ifndef AAC_DRIVER_BUILD
-# define AAC_DRIVER_BUILD 2461
+# define AAC_DRIVER_BUILD 24701
 # define AAC_DRIVER_BRANCH "-ms"
 #endif
 #define MAXIMUM_NUM_CONTAINERS	32
@@ -1036,6 +1036,9 @@ struct aac_dev
 	u8			printf_enabled;
 	u8			in_reset;
 	u8			msi;
+	int			management_fib_count;
+	spinlock_t		manage_lock;
+
 };
 
 #define aac_adapter_interrupt(dev) \
diff -purN a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
--- a/drivers/scsi/aacraid/commctrl.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/commctrl.c	2009-09-25 12:07:44.000000000 +0530
@@ -153,7 +153,7 @@ cleanup:
 		fibptr->hw_fib_pa = hw_fib_pa;
 		fibptr->hw_fib_va = hw_fib;
 	}
-	if (retval != -EINTR)
+	if (retval != -ERESTARTSYS)
 		aac_fib_free(fibptr);
 	return retval;
 }
@@ -322,7 +322,7 @@ return_fib:
 		}
 		if (f.wait) {
 			if(down_interruptible(&fibctx->wait_sem) < 0) {
-				status = -EINTR;
+				status = -ERESTARTSYS;
 			} else {
 				/* Lock again and retry */
 				spin_lock_irqsave(&dev->fib_lock, flags);
@@ -593,10 +593,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -645,10 +645,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -695,10 +695,10 @@ static int aac_send_raw_srb(struct aac_d
 				uintptr_t addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -734,10 +734,10 @@ static int aac_send_raw_srb(struct aac_d
 				dma_addr_t addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -772,8 +772,8 @@ static int aac_send_raw_srb(struct aac_d
 		psg->count = cpu_to_le32(sg_indx+1);
 		status = aac_fib_send(ScsiPortCommand, srbfib, actual_fibsize, FsaNormal, 1, 1, NULL, NULL);
 	}
-	if (status == -EINTR) {
-		rcode = -EINTR;
+	if (status == -ERESTARTSYS) {
+		rcode = -ERESTARTSYS;
 		goto cleanup;
 	}
 
@@ -810,7 +810,7 @@ cleanup:
 	for(i=0; i <= sg_indx; i++){
 		kfree(sg_list[i]);
 	}
-	if (rcode != -EINTR) {
+	if (rcode != -ERESTARTSYS) {
 		aac_fib_complete(srbfib);
 		aac_fib_free(srbfib);
 	}
@@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
 int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
 {
 	int status;
+	unsigned long mflags;
 
 	/*
 	 *	HBA gets first crack
 	 */
 
+	spin_lock_irqsave(&dev->manage_lock, mflags);
+	if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
+		printk(KERN_INFO "No management Fibs Available:%d\n",
+						dev->management_fib_count);
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
+		return -EBUSY;
+	}
+	spin_unlock_irqrestore(&dev->manage_lock, mflags);
 	status = aac_dev_ioctl(dev, cmd, arg);
-	if(status != -ENOTTY)
+	if (status != -ENOTTY)
 		return status;
 
 	switch (cmd) {
diff -purN a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
--- a/drivers/scsi/aacraid/comminit.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/comminit.c	2009-09-25 11:54:23.000000000 +0530
@@ -194,7 +194,9 @@ int aac_send_shutdown(struct aac_dev * d
 
 	if (status >= 0)
 		aac_fib_complete(fibctx);
-	aac_fib_free(fibctx);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibctx);
 	return status;
 }
 
@@ -304,6 +306,8 @@ struct aac_dev *aac_init_adapter(struct 
 	/*
 	 *	Check the preferred comm settings, defaults from template.
 	 */
+	dev->management_fib_count = 0;
+	spin_lock_init(&dev->manage_lock);
 	dev->max_fib_size = sizeof(struct hw_fib);
 	dev->sg_tablesize = host->sg_tablesize = (dev->max_fib_size
 		- sizeof(struct aac_fibhdr)
diff -purN a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
--- a/drivers/scsi/aacraid/commsup.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/commsup.c	2009-11-05 16:49:13.000000000 +0530
@@ -189,7 +189,14 @@ struct fib *aac_fib_alloc(struct aac_dev
 
 void aac_fib_free(struct fib *fibptr)
 {
-	unsigned long flags;
+	unsigned long flags, flagsv;
+
+	spin_lock_irqsave(&fibptr->event_lock, flagsv);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
+		return;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
 
 	spin_lock_irqsave(&fibptr->dev->fib_lock, flags);
 	if (unlikely(fibptr->flags & FIB_CONTEXT_FLAG_TIMED_OUT))
@@ -473,14 +480,27 @@ int aac_fib_send(u16 command, struct fib
 
 	if(wait)
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-	aac_adapter_deliver(fibptr);
+
+	if (aac_adapter_deliver(fibptr) != 0) {
+		printk(KERN_ERR "aac_fib_send: returned -EBUSY\n");
+		if (wait)
+			spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return -EBUSY;
+	}
+
 
 	/*
 	 *	If the caller wanted us to wait for response wait now.
 	 */
 
 	if (wait) {
+		unsigned long mflags;
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
+		spin_lock_irqsave(&dev->manage_lock, mflags);
+		dev->management_fib_count++;
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 		/* Only set for first known interruptable command */
 		if (wait < 0) {
 			/*
@@ -516,14 +536,15 @@ int aac_fib_send(u16 command, struct fib
 				udelay(5);
 			}
 		} else if (down_interruptible(&fibptr->event_wait)) {
-			fibptr->done = 2;
-			up(&fibptr->event_wait);
+			/* Do nothing ... satisfy
+			 * down_interruptible must_check */
 		}
+
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-		if ((fibptr->done == 0) || (fibptr->done == 2)) {
+		if (fibptr->done == 0) {
 			fibptr->done = 2; /* Tell interrupt we aborted */
 			spin_unlock_irqrestore(&fibptr->event_lock, flags);
-			return -EINTR;
+			return -ERESTARTSYS;
 		}
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
 		BUG_ON(fibptr->done == 0);
@@ -689,6 +710,7 @@ int aac_fib_adapter_complete(struct fib 
 
 int aac_fib_complete(struct fib *fibptr)
 {
+	unsigned long flags;
 	struct hw_fib * hw_fib = fibptr->hw_fib_va;
 
 	/*
@@ -709,6 +731,13 @@ int aac_fib_complete(struct fib *fibptr)
 	 *	command is complete that we had sent to the adapter and this
 	 *	cdb could be reused.
 	 */
+	spin_lock_irqsave(&fibptr->event_lock, flags);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return 0;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
 	if((hw_fib->header.XferState & cpu_to_le32(SentFromHost)) &&
 		(hw_fib->header.XferState & cpu_to_le32(AdapterProcessed)))
 	{
@@ -1355,7 +1384,10 @@ int aac_reset_adapter(struct aac_dev * a
 
 			if (status >= 0)
 				aac_fib_complete(fibctx);
-			aac_fib_free(fibctx);
+			/* FIB should be freed only after getting
+			 * the response from the F/W */
+			if (status != -ERESTARTSYS)
+				aac_fib_free(fibctx);
 		}
 	}
 
@@ -1759,6 +1791,7 @@ int aac_command_thread(void *data)
 				struct fib *fibptr;
 
 				if ((fibptr = aac_fib_alloc(dev))) {
+					int status;
 					__le32 *info;
 
 					aac_fib_init(fibptr);
@@ -1769,15 +1802,21 @@ int aac_command_thread(void *data)
 
 					*info = cpu_to_le32(now.tv_sec);
 
-					(void)aac_fib_send(SendHostTime,
+					status = aac_fib_send(SendHostTime,
 						fibptr,
 						sizeof(*info),
 						FsaNormal,
 						1, 1,
 						NULL,
 						NULL);
-					aac_fib_complete(fibptr);
-					aac_fib_free(fibptr);
+					/* Do not set XferState to zero unless
+					 * receives a response from F/W */
+					if (status >= 0)
+						aac_fib_complete(fibptr);
+					/* FIB should be freed only after
+					 * getting the response from the F/W */
+					if (status != -ERESTARTSYS)
+						aac_fib_free(fibptr);
 				}
 				difference = (long)(unsigned)update_interval*HZ;
 			} else {
diff -purN a/drivers/scsi/aacraid/dpcsup.c b/drivers/scsi/aacraid/dpcsup.c
--- a/drivers/scsi/aacraid/dpcsup.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/dpcsup.c	2009-09-25 12:08:44.000000000 +0530
@@ -57,9 +57,9 @@ unsigned int aac_response_normal(struct 
 	struct hw_fib * hwfib;
 	struct fib * fib;
 	int consumed = 0;
-	unsigned long flags;
+	unsigned long flags, mflags;
 
-	spin_lock_irqsave(q->lock, flags);	
+	spin_lock_irqsave(q->lock, flags);
 	/*
 	 *	Keep pulling response QEs off the response queue and waking
 	 *	up the waiters until there are no more QEs. We then return
@@ -125,12 +125,21 @@ unsigned int aac_response_normal(struct 
 		} else {
 			unsigned long flagv;
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
 			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
 				aac_fib_complete(fib);
 				aac_fib_free(fib);
 			}
@@ -232,6 +241,7 @@ unsigned int aac_command_normal(struct a
 
 unsigned int aac_intr_normal(struct aac_dev * dev, u32 index)
 {
+	unsigned long mflags;
 	dprintk((KERN_INFO "aac_intr_normal(%p,%x)\n", dev, index));
 	if ((index & 0x00000002L)) {
 		struct hw_fib * hw_fib;
@@ -320,11 +330,25 @@ unsigned int aac_intr_normal(struct aac_
 			unsigned long flagv;
 	  		dprintk((KERN_INFO "event_wait up\n"));
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
+			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
+				aac_fib_complete(fib);
+				aac_fib_free(fib);
+			}
+
 		}
 		return 0;
 	}

[-- Attachment #3: community_aac24701.email --]
[-- Type: application/octet-stream, Size: 693 bytes --]


This attached patch is against current scsi-misc-2.6.

ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments.

Signed-off-by: Penchala Narasimha Reddy <ServeRAIDDriver@hcl.in>

 drivers/scsi/aacraid/aachba.c   |   52 +++++++++++++++++++++++++++--------
 drivers/scsi/aacraid/aacraid.h  |    5 ++-
 drivers/scsi/aacraid/commctrl.c |   37 +++++++++++++++----------
 drivers/scsi/aacraid/comminit.c |    6 +++-
 drivers/scsi/aacraid/commsup.c  |   59 +++++++++++++++++++++++++++++++++-------
 drivers/scsi/aacraid/dpcsup.c   |   36 ++++++++++++++++++++----
 6 files changed, 151 insertions(+), 44 deletions(-)

Sincerely - Penchala Narasimha Reddy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-04  7:21       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
@ 2009-11-04 15:30         ` James Bottomley
  0 siblings, 0 replies; 40+ messages in thread
From: James Bottomley @ 2009-11-04 15:30 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
  Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

On Wed, 2009-11-04 at 12:51 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> > This is all spurious bracket addition; it doesn't really have any place
> > in a code fix.
> >
> > <Narasimha Reddy>: As we know that if we do not keep bracket
> > appropriately, some times compilers will behave differently and it may
> > use different instructions and evaluate differently. So to avoid those
> > kinds of issues, we added those brackets. It is not a code fix, but it
> > is compiler fix (compilers should not interpret differently). If
> > compiler interprets differently then the logic may behave differently
> > as against to your intentional behavior.  I hope you agree with me.
> 
> Not really ... in principle we try to fix the compilers rather than work
> around their bugs.  Which compilers are exhibiting the problem? ...
> because they'll affect more than just aacraid.
> 
> <Narasimha Reddy>
> 
> I hope you agree with Stefen's explanation.

Yes ... rechecked the code I'd missed the operator precedence problem
with :?

>  Generally, gnu compilers are buggy so the expression after the patch
> is more transparent and right one.

Well, generally gnu compilers aren't buggy ... plus this is a red
herring.  The operator precedence is defined by standard to be the same
for all compilers, so they would all interpret the expressions you're
changing in the same way.

> That is how a good c-programmer has to do to avoid such kind of
> issues. That is what we have done in the patch.

The issue, surely, is a simple operator precedence related bug ... it's
nothing to do with programmer practice or compilers.

> > > @@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
> > >  int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
> > >  {
> > >         int status;
> > > +       unsigned long mflags;
> > >
> > >         /*
> > >          *      HBA gets first crack
> > >          */
> > >
> > > +       spin_lock_irqsave(&dev->manage_lock, mflags);
> > > +       if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
> > > +               printk(KERN_INFO "No management Fibs Available:%d\n",
> > > +                                               dev->management_fib_count);
> > > +               spin_unlock_irqrestore(&dev->manage_lock, mflags);
> > > +               return -EBUSY;
> > > +       }
> > > +       spin_unlock_irqrestore(&dev->manage_lock, mflags);
> > >         status = aac_dev_ioctl(dev, cmd, arg);
> >
> > I can see what you're trying to do: limit the number of ioctl created
> > fibs to prevent starvation, but this mechanism is completely racy.  The
> > check is too far away from the increment.  It looks fixable just by
> > doing the increment within the lock here and decrementing if something
> > fails.
> >
> >
> > <Narasimha Reddy>: My intention is not to prevent starvation, but my
> > intention is that management fibs should not cross more than
> > "AAC_NUM_MGT_FIB". During our testing, we have seen any racy
> > condition.
> 
> The race is pretty obvious.  You lock check and drop the lock.  Multiple
> threads can go through that code at the same time, pass and then take
> the count above AAC_NUM_MGT_FIB.
> 
> I think currently you may have a mediation on the BKL which will
> mitigate this race, but that would go when the BKL is removed, so you
> need to be ready for that.
> 
> 
> <Narasimha Reddy> we will keep this in mind and we watch it during our
> testing. If we see such kind of issue then we will do appropriate
> changes and give you a separate small patch.

Why not just fix it now rather than wait for problems to appear due to
an unrelated change at a later time.  To make this cast iron, you do:

       spin_lock_irqsave(&dev->manage_lock, mflags);
       if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
               printk(KERN_INFO "No management Fibs Available:%d\n",
                                           dev->management_fib_count);
               spin_unlock_irqrestore(&dev->manage_lock, mflags);
               return -EBUSY;
       }
       dev->management_fib_count++;
       spin_unlock_irqrestore(&dev->manage_lock, mflags);

and decrement under lock on every exit path (or even and more simply use
a counting semaphore).

James



^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-03 14:40     ` James Bottomley
  2009-11-04  1:23       ` Stefan Richter
@ 2009-11-04  7:21       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-04 15:30         ` James Bottomley
  1 sibling, 1 reply; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-11-04  7:21 UTC (permalink / raw)
  To: James Bottomley; +Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

Hi James,

My answers are inline to your queries.

Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de]
Sent: Tuesday, November 03, 2009 8:11 PM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Tue, 2009-11-03 at 15:47 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> This is all spurious bracket addition; it doesn't really have any place
> in a code fix.
>
> <Narasimha Reddy>: As we know that if we do not keep bracket
> appropriately, some times compilers will behave differently and it may
> use different instructions and evaluate differently. So to avoid those
> kinds of issues, we added those brackets. It is not a code fix, but it
> is compiler fix (compilers should not interpret differently). If
> compiler interprets differently then the logic may behave differently
> as against to your intentional behavior.  I hope you agree with me.

Not really ... in principle we try to fix the compilers rather than work
around their bugs.  Which compilers are exhibiting the problem? ...
because they'll affect more than just aacraid.

<Narasimha Reddy>

I hope you agree with Stefen's explanation. Generally, gnu compilers are buggy so the expression after the patch is more transparent and right one.
That is how a good c-programmer has to do to avoid such kind of issues. That is what we have done in the patch.


> > @@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
> >  int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
> >  {
> >         int status;
> > +       unsigned long mflags;
> >
> >         /*
> >          *      HBA gets first crack
> >          */
> >
> > +       spin_lock_irqsave(&dev->manage_lock, mflags);
> > +       if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
> > +               printk(KERN_INFO "No management Fibs Available:%d\n",
> > +                                               dev->management_fib_count);
> > +               spin_unlock_irqrestore(&dev->manage_lock, mflags);
> > +               return -EBUSY;
> > +       }
> > +       spin_unlock_irqrestore(&dev->manage_lock, mflags);
> >         status = aac_dev_ioctl(dev, cmd, arg);
>
> I can see what you're trying to do: limit the number of ioctl created
> fibs to prevent starvation, but this mechanism is completely racy.  The
> check is too far away from the increment.  It looks fixable just by
> doing the increment within the lock here and decrementing if something
> fails.
>
>
> <Narasimha Reddy>: My intention is not to prevent starvation, but my
> intention is that management fibs should not cross more than
> "AAC_NUM_MGT_FIB". During our testing, we have seen any racy
> condition.

The race is pretty obvious.  You lock check and drop the lock.  Multiple
threads can go through that code at the same time, pass and then take
the count above AAC_NUM_MGT_FIB.

I think currently you may have a mediation on the BKL which will
mitigate this race, but that would go when the BKL is removed, so you
need to be ready for that.


<Narasimha Reddy> we will keep this in mind and we watch it during our testing. If we see such kind of issue then we will do appropriate changes and give you a separate small patch.

James

> Even no end user has reported such kind of problem. We have given a
> private build to end customers two months back. So far nobody reported
> the issue.
>
> James
>
>
>
>
> DISCLAIMER:
> -----------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
> this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
> received this email in error please delete it and notify the sender immediately. Before opening any mail and
> attachments please check them for viruses and defect.
>
> -----------------------------------------------------------------------------------------------------------------------




^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-04  1:23       ` Stefan Richter
  2009-11-04  1:33         ` Stefan Richter
@ 2009-11-04  6:44         ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  1 sibling, 0 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-11-04  6:44 UTC (permalink / raw)
  To: Stefan Richter, James Bottomley
  Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

Hi Stefan,

Thanks a lot for your nice feedback. 
I hope that James will agree with Stefan's explanation.

Expression before the patch:	a > (b & c) ? d : e
Expression after the patch:	a > ((b & c) ? d : e)

The expression after the patch is transparent and right one.

Thanks,
Narasimha Reddy

-----Original Message-----
From: Stefan Richter [mailto:stefanr@s5r6.in-berlin.de] 
Sent: Wednesday, November 04, 2009 6:54 AM
To: James Bottomley
Cc: Penchala Narasimha Reddy Chilakala, TLS-Chennai; 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: Re: [PATCH ] scsi-misc-2.6: File System going into read-only mode

James Bottomley wrote:
> On Tue, 2009-11-03 at 15:47 +0530, Penchala Narasimha Reddy Chilakala,
> TLS-Chennai wrote:
>> This is all spurious bracket addition; it doesn't really have any place
>> in a code fix.
>>
>> <Narasimha Reddy>: As we know that if we do not keep bracket
>> appropriately, some times compilers will behave differently and it may
>> use different instructions and evaluate differently. So to avoid those
>> kinds of issues, we added those brackets. It is not a code fix, but it
>> is compiler fix (compilers should not interpret differently). If
>> compiler interprets differently then the logic may behave differently
>> as against to your intentional behavior.  I hope you agree with me.
> 
> Not really ... in principle we try to fix the compilers rather than work
> around their bugs.  Which compilers are exhibiting the problem? ...
> because they'll affect more than just aacraid.

Huh?

Expression before the patch:	a > (b & c) ? d : e
Expression after the patch:	a > ((b & c) ? d : e)

That's clearly different; a C compiler would have to be extremely buggy
to get such expressions wrong.  So which is really the right one; before
or after?

Before:

if (upsg->sg[i].count > (dev->adapter_info.options & AAC_OPT_NEW_COMM)
        ? (dev->scsi_host_ptr->max_sectors << 9) : 65536) {

After:

if (upsg->sg[i].count > (
          (dev->adapter_info.options & AAC_OPT_NEW_COMM)
                 ? (dev->scsi_host_ptr->max_sectors << 9) : 65536
    )) {
-- 
Stefan Richter
-=====-==--= =-== --=--
http://arcgraph.de/sr/

DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-04  1:33         ` Stefan Richter
@ 2009-11-04  1:40           ` Stefan Richter
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Richter @ 2009-11-04  1:40 UTC (permalink / raw)
  To: James Bottomley
  Cc: Penchala Narasimha Reddy Chilakala, TLS-Chennai,
	'linux-scsi@vger.kernel.org',
	ServeRAID Driver

Stefan Richter wrote:
> Stefan Richter wrote:
>> Before:
>>
>> if (upsg->sg[i].count > (dev->adapter_info.options & AAC_OPT_NEW_COMM)
>>         ? (dev->scsi_host_ptr->max_sectors << 9) : 65536) {
> 
> Doesn't make sense.  dev->adapter_info.options is a __le32 bitfield.
> 
>> After:
>>
>> if (upsg->sg[i].count > (
>>           (dev->adapter_info.options & AAC_OPT_NEW_COMM)
>>                  ? (dev->scsi_host_ptr->max_sectors << 9) : 65536
>>     )) {
> 
> Makes sense.  Don't know if it's correct.

PS:  These things wouldn't happen if
  a) the coding wouldn't be as deeply nested,
  b) sparse with endianess check would be used for QA.
(OK, it's apparently a decade old code.)
-- 
Stefan Richter
-=====-==--= =-== --=--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-04  1:23       ` Stefan Richter
@ 2009-11-04  1:33         ` Stefan Richter
  2009-11-04  1:40           ` Stefan Richter
  2009-11-04  6:44         ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  1 sibling, 1 reply; 40+ messages in thread
From: Stefan Richter @ 2009-11-04  1:33 UTC (permalink / raw)
  To: James Bottomley
  Cc: Penchala Narasimha Reddy Chilakala, TLS-Chennai,
	'linux-scsi@vger.kernel.org',
	ServeRAID Driver

Stefan Richter wrote:
> James Bottomley wrote:
>>> This is all spurious bracket addition; it doesn't really have any place
>>> in a code fix.
[...]
> Before:
> 
> if (upsg->sg[i].count > (dev->adapter_info.options & AAC_OPT_NEW_COMM)
>         ? (dev->scsi_host_ptr->max_sectors << 9) : 65536) {

Doesn't make sense.  dev->adapter_info.options is a __le32 bitfield.

> After:
> 
> if (upsg->sg[i].count > (
>           (dev->adapter_info.options & AAC_OPT_NEW_COMM)
>                  ? (dev->scsi_host_ptr->max_sectors << 9) : 65536
>     )) {

Makes sense.  Don't know if it's correct.
-- 
Stefan Richter
-=====-==--= =-== --=--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-03 14:40     ` James Bottomley
@ 2009-11-04  1:23       ` Stefan Richter
  2009-11-04  1:33         ` Stefan Richter
  2009-11-04  6:44         ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-04  7:21       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  1 sibling, 2 replies; 40+ messages in thread
From: Stefan Richter @ 2009-11-04  1:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: Penchala Narasimha Reddy Chilakala, TLS-Chennai,
	'linux-scsi@vger.kernel.org',
	ServeRAID Driver

James Bottomley wrote:
> On Tue, 2009-11-03 at 15:47 +0530, Penchala Narasimha Reddy Chilakala,
> TLS-Chennai wrote:
>> This is all spurious bracket addition; it doesn't really have any place
>> in a code fix.
>>
>> <Narasimha Reddy>: As we know that if we do not keep bracket
>> appropriately, some times compilers will behave differently and it may
>> use different instructions and evaluate differently. So to avoid those
>> kinds of issues, we added those brackets. It is not a code fix, but it
>> is compiler fix (compilers should not interpret differently). If
>> compiler interprets differently then the logic may behave differently
>> as against to your intentional behavior.  I hope you agree with me.
> 
> Not really ... in principle we try to fix the compilers rather than work
> around their bugs.  Which compilers are exhibiting the problem? ...
> because they'll affect more than just aacraid.

Huh?

Expression before the patch:	a > (b & c) ? d : e
Expression after the patch:	a > ((b & c) ? d : e)

That's clearly different; a C compiler would have to be extremely buggy
to get such expressions wrong.  So which is really the right one; before
or after?

Before:

if (upsg->sg[i].count > (dev->adapter_info.options & AAC_OPT_NEW_COMM)
        ? (dev->scsi_host_ptr->max_sectors << 9) : 65536) {

After:

if (upsg->sg[i].count > (
          (dev->adapter_info.options & AAC_OPT_NEW_COMM)
                 ? (dev->scsi_host_ptr->max_sectors << 9) : 65536
    )) {
-- 
Stefan Richter
-=====-==--= =-== --=--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-03 10:17   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
@ 2009-11-03 14:40     ` James Bottomley
  2009-11-04  1:23       ` Stefan Richter
  2009-11-04  7:21       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  0 siblings, 2 replies; 40+ messages in thread
From: James Bottomley @ 2009-11-03 14:40 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
  Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

On Tue, 2009-11-03 at 15:47 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> This is all spurious bracket addition; it doesn't really have any place
> in a code fix.
> 
> <Narasimha Reddy>: As we know that if we do not keep bracket
> appropriately, some times compilers will behave differently and it may
> use different instructions and evaluate differently. So to avoid those
> kinds of issues, we added those brackets. It is not a code fix, but it
> is compiler fix (compilers should not interpret differently). If
> compiler interprets differently then the logic may behave differently
> as against to your intentional behavior.  I hope you agree with me.

Not really ... in principle we try to fix the compilers rather than work
around their bugs.  Which compilers are exhibiting the problem? ...
because they'll affect more than just aacraid.

> > @@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
> >  int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
> >  {
> >         int status;
> > +       unsigned long mflags;
> >
> >         /*
> >          *      HBA gets first crack
> >          */
> >
> > +       spin_lock_irqsave(&dev->manage_lock, mflags);
> > +       if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
> > +               printk(KERN_INFO "No management Fibs Available:%d\n",
> > +                                               dev->management_fib_count);
> > +               spin_unlock_irqrestore(&dev->manage_lock, mflags);
> > +               return -EBUSY;
> > +       }
> > +       spin_unlock_irqrestore(&dev->manage_lock, mflags);
> >         status = aac_dev_ioctl(dev, cmd, arg);
> 
> I can see what you're trying to do: limit the number of ioctl created
> fibs to prevent starvation, but this mechanism is completely racy.  The
> check is too far away from the increment.  It looks fixable just by
> doing the increment within the lock here and decrementing if something
> fails.
> 
> 
> <Narasimha Reddy>: My intention is not to prevent starvation, but my
> intention is that management fibs should not cross more than
> "AAC_NUM_MGT_FIB". During our testing, we have seen any racy
> condition.

The race is pretty obvious.  You lock check and drop the lock.  Multiple
threads can go through that code at the same time, pass and then take
the count above AAC_NUM_MGT_FIB.

I think currently you may have a mediation on the BKL which will
mitigate this race, but that would go when the BKL is removed, so you
need to be ready for that.

James

> Even no end user has reported such kind of problem. We have given a
> private build to end customers two months back. So far nobody reported
> the issue.
> 
> James
> 
> 
> 
> 
> DISCLAIMER:
> -----------------------------------------------------------------------------------------------------------------------
> 
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
> It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
> this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
> Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
> this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
> received this email in error please delete it and notify the sender immediately. Before opening any mail and 
> attachments please check them for viruses and defect.
> 
> -----------------------------------------------------------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-11-02 17:46 ` James Bottomley
@ 2009-11-03 10:17   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-03 14:40     ` James Bottomley
  0 siblings, 1 reply; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-11-03 10:17 UTC (permalink / raw)
  To: James Bottomley
  Cc: 'linux-scsi@vger.kernel.org', James Bottomley, ServeRAID Driver

Hi James,

My answers are inline to your queries.

Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de]
Sent: Monday, November 02, 2009 11:17 PM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: 'linux-scsi@vger.kernel.org'; James Bottomley; ServeRAID Driver
Subject: Re: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Tue, 2009-09-29 at 14:17 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:

> @@ -593,10 +593,10 @@ static int aac_send_raw_srb(struct aac_d
>                                 u64 addr;
>                                 void* p;
>                                 if (upsg->sg[i].count >
> -                                   (dev->adapter_info.options &
> +                                   ((dev->adapter_info.options &
>                                      AAC_OPT_NEW_COMM) ?
>                                       (dev->scsi_host_ptr->max_sectors
> << 9) :
> -                                     65536) {
> +                                     65536)) {
>                                         rcode = -EINVAL;
>                                         goto cleanup;
>                                 }
> @@ -645,10 +645,10 @@ static int aac_send_raw_srb(struct aac_d
>                                 u64 addr;
>                                 void* p;
>                                 if (usg->sg[i].count >
> -                                   (dev->adapter_info.options &
> +                                   ((dev->adapter_info.options &
>                                      AAC_OPT_NEW_COMM) ?
>                                       (dev->scsi_host_ptr->max_sectors
> << 9) :
> -                                     65536) {
> +                                     65536)) {
>                                         rcode = -EINVAL;
>                                         goto cleanup;
>                                 }
> @@ -695,10 +695,10 @@ static int aac_send_raw_srb(struct aac_d
>                                 uintptr_t addr;
>                                 void* p;
>                                 if (usg->sg[i].count >
> -                                   (dev->adapter_info.options &
> +                                   ((dev->adapter_info.options &
>                                      AAC_OPT_NEW_COMM) ?
>                                       (dev->scsi_host_ptr->max_sectors
> << 9) :
> -                                     65536) {
> +                                     65536)) {
>                                         rcode = -EINVAL;
>                                         goto cleanup;
>                                 }
> @@ -734,10 +734,10 @@ static int aac_send_raw_srb(struct aac_d
>                                 dma_addr_t addr;
>                                 void* p;
>                                 if (upsg->sg[i].count >
> -                                   (dev->adapter_info.options &
> +                                   ((dev->adapter_info.options &
>                                      AAC_OPT_NEW_COMM) ?
>                                       (dev->scsi_host_ptr->max_sectors
> << 9) :
> -                                     65536) {
> +                                     65536)) {
>                                         rcode = -EINVAL;
>                                         goto cleanup;
>                                 }

This is all spurious bracket addition; it doesn't really have any place
in a code fix.

<Narasimha Reddy>: As we know that if we do not keep bracket appropriately, some times compilers will behave differently and it may use different instructions and evaluate differently. So to avoid those kinds of issues, we added those brackets. It is not a code fix, but it is compiler fix (compilers should not interpret differently). If compiler interprets differently then the logic may behave differently as against to your intentional behavior.  I hope you agree with me.



> @@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
>  int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
>  {
>         int status;
> +       unsigned long mflags;
>
>         /*
>          *      HBA gets first crack
>          */
>
> +       spin_lock_irqsave(&dev->manage_lock, mflags);
> +       if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
> +               printk(KERN_INFO "No management Fibs Available:%d\n",
> +                                               dev->management_fib_count);
> +               spin_unlock_irqrestore(&dev->manage_lock, mflags);
> +               return -EBUSY;
> +       }
> +       spin_unlock_irqrestore(&dev->manage_lock, mflags);
>         status = aac_dev_ioctl(dev, cmd, arg);

I can see what you're trying to do: limit the number of ioctl created
fibs to prevent starvation, but this mechanism is completely racy.  The
check is too far away from the increment.  It looks fixable just by
doing the increment within the lock here and decrementing if something
fails.


<Narasimha Reddy>: My intention is not to prevent starvation, but my intention is that management fibs should not cross more than "AAC_NUM_MGT_FIB". During our testing, we have seen any racy condition.
Even no end user has reported such kind of problem. We have given a private build to end customers two months back. So far nobody reported the issue.

James




DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-09-29  8:47 Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-09-29 14:19 ` James Bottomley
@ 2009-11-02 17:46 ` James Bottomley
  2009-11-03 10:17   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  1 sibling, 1 reply; 40+ messages in thread
From: James Bottomley @ 2009-11-02 17:46 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
  Cc: 'linux-scsi@vger.kernel.org', James Bottomley, ServeRAID Driver

On Tue, 2009-09-29 at 14:17 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:

> @@ -593,10 +593,10 @@ static int aac_send_raw_srb(struct aac_d
>                                 u64 addr;
>                                 void* p;
>                                 if (upsg->sg[i].count >
> -                                   (dev->adapter_info.options &
> +                                   ((dev->adapter_info.options &
>                                      AAC_OPT_NEW_COMM) ?
>                                       (dev->scsi_host_ptr->max_sectors
> << 9) :
> -                                     65536) {
> +                                     65536)) {
>                                         rcode = -EINVAL;
>                                         goto cleanup;
>                                 }
> @@ -645,10 +645,10 @@ static int aac_send_raw_srb(struct aac_d
>                                 u64 addr;
>                                 void* p;
>                                 if (usg->sg[i].count >
> -                                   (dev->adapter_info.options &
> +                                   ((dev->adapter_info.options &
>                                      AAC_OPT_NEW_COMM) ?
>                                       (dev->scsi_host_ptr->max_sectors
> << 9) :
> -                                     65536) {
> +                                     65536)) {
>                                         rcode = -EINVAL;
>                                         goto cleanup;
>                                 }
> @@ -695,10 +695,10 @@ static int aac_send_raw_srb(struct aac_d
>                                 uintptr_t addr;
>                                 void* p;
>                                 if (usg->sg[i].count >
> -                                   (dev->adapter_info.options &
> +                                   ((dev->adapter_info.options &
>                                      AAC_OPT_NEW_COMM) ?
>                                       (dev->scsi_host_ptr->max_sectors
> << 9) :
> -                                     65536) {
> +                                     65536)) {
>                                         rcode = -EINVAL;
>                                         goto cleanup;
>                                 }
> @@ -734,10 +734,10 @@ static int aac_send_raw_srb(struct aac_d
>                                 dma_addr_t addr;
>                                 void* p;
>                                 if (upsg->sg[i].count >
> -                                   (dev->adapter_info.options &
> +                                   ((dev->adapter_info.options &
>                                      AAC_OPT_NEW_COMM) ?
>                                       (dev->scsi_host_ptr->max_sectors
> << 9) :
> -                                     65536) {
> +                                     65536)) {
>                                         rcode = -EINVAL;
>                                         goto cleanup;
>                                 }

This is all spurious bracket addition; it doesn't really have any place
in a code fix.


> @@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
>  int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
>  {
>         int status;
> +       unsigned long mflags;
>  
>         /*
>          *      HBA gets first crack
>          */
>  
> +       spin_lock_irqsave(&dev->manage_lock, mflags);
> +       if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
> +               printk(KERN_INFO "No management Fibs Available:%d\n",
> +                                               dev->management_fib_count);
> +               spin_unlock_irqrestore(&dev->manage_lock, mflags);
> +               return -EBUSY;
> +       }
> +       spin_unlock_irqrestore(&dev->manage_lock, mflags);
>         status = aac_dev_ioctl(dev, cmd, arg);

I can see what you're trying to do: limit the number of ioctl created
fibs to prevent starvation, but this mechanism is completely racy.  The
check is too far away from the increment.  It looks fixable just by
doing the increment within the lock here and decrementing if something
fails.

James




^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-10-22  1:09 ` James Bottomley
@ 2009-10-23 13:10   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  0 siblings, 0 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-10-23 13:10 UTC (permalink / raw)
  To: James Bottomley; +Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

Hi James,

Thanks a lot for your feedback.

Here is my explanation for the change. We have done good amount of review and testing before finalizing the change. We found the following code also caused to exit the management request prematurely without getting response from the firmware during our review, which in turn caused to generated False Raid Alert events in the application layer. 

So after doing good amount of review and testing, we replace the below code

	    else if (down_interruptible(&fibptr->event_wait)) {
                      fibptr->done = 2;
                      up(&fibptr->event_wait);
          }

With the following code
	    } else
          	down_interruptible (&fibptr->event_wait);

To overcome the warning, we can replace the above code with either

    	    } else if (down_interruptible (&fibptr->event_wait));
               
			Or

	    } else {
          	 if (down_interruptible (&fibptr->event_wait))
			;
	    }

I am ok to do the above mentioned change in the code and resubmit the patch with the above change as there is no issue with respect to functionality and other aspects as such.

Please let me know your opinion on this so that I will proceed further based on your feedback. 

 
Important note: 
-----------------

fibptr->event_wait is initialized with the following OS kernel function call during the driver initialization.

	init_MUTEX_LOCKED(&fibptr->event_wait);

The above OS kernel function call initializes the semaphore to locked (unavailable) state. On using the fibptr->event_wait by down_interruptible, the semaphore is not available so process/task will enter into sleep mode until the semaphore is available. The semaphore will be available only when the response gets from the firmware for the request delivered to the firmware. When response comes from the firmware then driver's response path code calls the up() if the fibpter->done is equal to zero. So the corresponding task/process gets semaphore and wakes the task/process up and proceeds with further processing. 

Suppose the process/task, who is waiting for semaphore, gets exited before getting response from the firmware due to some signal, then driver should not call up() function as there is no point calling up() function after exiting as the fibptr->event_wait was initialized to locked (unavailable) state during the driver initialization. 

So my point, here, is that we should not call up() if process/task is exited prematurely before getting the response from the firmware for the request delivered to firmware as the fibptr->event_wait was initialized to locked (unavailable) state. 
---------------------------------------------------------------------------

Thanks,
Narasimha Reddy 

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de] 
Sent: Thursday, October 22, 2009 6:40 AM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Fri, 2009-10-09 at 14:53 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Hi James and linux-scsi community,
>
> Can you please let us know your feedback on my response to the queries
> raised by James so that I will proceed further based on your feedback?

Sorry, press of conferences has delayed me getting around to looking at
this.

The patch is spitting this compile warning:

drivers/scsi/aacraid/commsup.c: In function 'aac_fib_send':
drivers/scsi/aacraid/commsup.c:539: warning: ignoring return value of 'down_interruptible', declared with attribute warn_unused_result

Because of this hunk:

@@ -515,15 +535,14 @@ int aac_fib_send(u16 command, struct fib *fibptr, unsigned
                                }
                                udelay(5);
                        }
-               } else if (down_interruptible(&fibptr->event_wait)) {
-                       fibptr->done = 2;
-                       up(&fibptr->event_wait);
-               }
+               } else
+                       down_interruptible(&fibptr->event_wait);


The reason for the warning is that down_interruptible() will return an
error if it gets interrupted without acquiring the semaphore, and the
kernel checkers believe you can't write correct code without knowing
whether you acquired the semaphore or not.

Now, the original use case didn't care it was just waiting for the
semaphore to become available but didn't actually want to acquire it, so
I suspect what you want is:

} else if (down_interruptible(&fibptr->event_wait))
                     up(&fibptr->event_wait);

If you suddenly do care about acquiring the semaphore, you need to do
something about the interrupted failure case.

James




DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-10-09  9:23 Penchala Narasimha Reddy Chilakala, TLS-Chennai
@ 2009-10-22  1:09 ` James Bottomley
  2009-10-23 13:10   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  0 siblings, 1 reply; 40+ messages in thread
From: James Bottomley @ 2009-10-22  1:09 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
  Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

On Fri, 2009-10-09 at 14:53 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Hi James and linux-scsi community,
> 
> Can you please let us know your feedback on my response to the queries
> raised by James so that I will proceed further based on your feedback?

Sorry, press of conferences has delayed me getting around to looking at
this.

The patch is spitting this compile warning:

drivers/scsi/aacraid/commsup.c: In function 'aac_fib_send':
drivers/scsi/aacraid/commsup.c:539: warning: ignoring return value of 'down_interruptible', declared with attribute warn_unused_result

Because of this hunk:

@@ -515,15 +535,14 @@ int aac_fib_send(u16 command, struct fib *fibptr, unsigned
                                }
                                udelay(5);
                        }
-               } else if (down_interruptible(&fibptr->event_wait)) {
-                       fibptr->done = 2;
-                       up(&fibptr->event_wait);
-               }
+               } else
+                       down_interruptible(&fibptr->event_wait);


The reason for the warning is that down_interruptible() will return an
error if it gets interrupted without acquiring the semaphore, and the
kernel checkers believe you can't write correct code without knowing
whether you acquired the semaphore or not.

Now, the original use case didn't care it was just waiting for the
semaphore to become available but didn't actually want to acquire it, so
I suspect what you want is:

} else if (down_interruptible(&fibptr->event_wait))
                     up(&fibptr->event_wait);

If you suddenly do care about acquiring the semaphore, you need to do
something about the interrupted failure case.

James




^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
@ 2009-10-09  9:23 Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-10-22  1:09 ` James Bottomley
  0 siblings, 1 reply; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-10-09  9:23 UTC (permalink / raw)
  To: James Bottomley; +Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

Hi James and linux-scsi community,

Can you please let us know your feedback on my response to the queries raised by James so that I will proceed further based on your feedback?

Thanks,
Narasimha Reddy 

-----Original Message-----
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai 
Sent: Wednesday, October 07, 2009 11:33 AM
To: James Bottomley
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode


Hi James and linux-scsi community,

Have you had a chance to look at my inline responses to the queries raised by James?

Thanks,
Narasimha Reddy

-----Original Message-----
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai 
Sent: Wednesday, September 30, 2009 5:04 PM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai; James Bottomley
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

Hi James and linux-scsi community,

Sorry for missing a word "not" in the previous mail for the following inline response.

flags is the usual linux convention for this ... and that's what's used
throughout the rest of the file ... consistency on flags would be better
(for all the uses).

<Previous response> I do have any issue to replace "mflags" with "flags", but we have used this mflags in dpcsup.c, and commsup.c as well. We initially thought of using flags only, but flags variable is used by some other spinlock function in the same function in dpcsup.c so we do not have any other alternative and went with different name. So we used mflags for the management fib count across all the files.

<Correct response>
I do not have any issue to replace "mflags" with "flags", but we have used this mflags in dpcsup.c, and commsup.c as well. We initially thought of using flags only, but flags variable is used by some other spinlock function in the same function in dpcsup.c so we do not have any other alternative and went with different name. So we used mflags for the management fib count across all the files.



Thanks,
Narasimha Reddy
 

-----Original Message-----
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai 
Sent: Wednesday, September 30, 2009 4:40 PM
To: James Bottomley
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

Hi James and linux-scsi community,

Thank you very much for your valuable feedback.

Please have a look at my inline responses and let me your opinion so that I will go ahead and do the necessary changes if required.

Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de] 
Sent: Tuesday, September 29, 2009 7:50 PM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: Re: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Tue, 2009-09-29 at 14:17 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Hi James and linux-scsi community,
>
>        I would request you that please review the attached patch and do the needful to incorporate the patch in up-coming release. I am happy to provide if any additional info is required to accept the patch.
>
> From: Narasimha Reddy <ServeRAIDDriver@hcl.in>
>
>         The attached patch was generated for fixing the following issues. These issues were reported by Cisco, SAP and some other customers as well.
>
> Regarding the testing of fixes:
> --------------------------------
>
>         These particular problems were reported by Cisco and SAP and customers as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build to IBM and Cisco. Cisco and IBM tested it for more than 15 days and they reported that they did not see the issue so far. Before the fix, Cisco used to see the issue within 5 days. We generated a patch for SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the patch and gave a test build to SAP. SAP tested and reported that the build is working properly.
>
> We also tested in our lab using the tools "dishogsync", which is IO stress tool and the tool was provided by Cisco.
>
> Issue1: File System going into read-only mode
> ---------
>
> Root cause:
> -----------
>        The driver tends to not free the memory (FIB) when the management request exits prematurely. The accumulation of such un-freed memory causes the driver to fail to allocate anymore memory (FIB) and hence return 0x70000 value to the upper layer, which puts the file system into read only mode.
>
> Fix details:
> ------------
>      The fix makes sure to free the memory (FIB) even if the request exits prematurely hence ensuring the driver wouldn't run out of memory (FIBs).
>
>
> Issue2:
> -------
>         False Raid Alert occurs- when the Physical Drives and Logical drives are reported as deleted or added, even though there is no change done on the system
>
> Root cause:
> -----------
>         Driver IOCTLs is signaled with EINTR while waiting on response from the lower layers. Returning "EINTR" will never initiate internal retry.
>
> Fix details:
> ------------
>         The issue was fixed by replacing "EINTR" with "ERESTARTSYS" for mid-layer retries.
>
>
> Signed-off-by: Narasimha Reddy <ServeRAIDDriver@hcl.in>

Would it be possible to split this into two patches, one that fixes each
issue?  That makes it easier for distro backporting and bisection
checking if something goes wrong with the fix

<Narasimha Reddy> Actually, the issues are inter-linked each other even though customers reported the issues separately. So I feel that it would be better go with as single patch instead of separate patches.

>        Please find the attached patch file "community_aac24701.patch", which has been generated for the issues we fixed and "email" file "community_aac24701.email", which has been generated for the statistics like number of lines inserted and deleted, errors etc.

That's fine ... I note you're using outlook, which is known to mangle
inline patches.

> Regards - Narasimha Reddy
>
>
> DISCLAIMER:
> -----------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
> this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
> received this email in error please delete it and notify the sender immediately. Before opening any mail and
> attachments please check them for viruses and defect.
>
> -----------------------------------------------------------------------------------------------------------------------

It does look a bit odd to have a disclaimer like this on an email you've
just sent to a public list ...

<Narasimha Reddy>:   Yes, I agree with your comment, but the disclaimer will be added by HCL email-server for all outgoing mails outside HCL. I will talk to HCL IT service about this if it is serious issue. Please let me know.


> diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
> --- a/drivers/scsi/aacraid/aachba.c     2009-09-10 03:43:59.000000000 +0530
> +++ b/drivers/scsi/aacraid/aachba.c     2009-09-25 11:43:12.000000000 +0530
> @@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
>                         status = -EINVAL;
>                 }
>         }
> -       aac_fib_complete(fibptr);
> +       /* Do not set XferState to zero unless receives a response from F/W */
> +       if (status >= 0)
> +               aac_fib_complete(fibptr);

This can just move under the if (status >= 0) below, can't it?

<Narasimha Reddy>: I accept your suggestion as we can avoid two times execution of the same condition and redundancy. Earlier there was no check and comment before calling the aac_fib_complete function. We added a check to fix the issues and comment to understand that why we have done the change and it is easy for reviewing when we have comments like this. 

> +
>         /* Send a CT_COMMIT_CONFIG to enable discovery of devices */
>         if (status >= 0) {

[...]


> @@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
>  int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
>  {
>         int status;
> +       unsigned long mflags;

flags is the usual linux convention for this ... and that's what's used
throughout the rest of the file ... consistency on flags would be better
(for all the uses).

<Narasimha Reddy> I do have any issue to replace "mflags" with "flags", but we have used this mflags in dpcsup.c, and commsup.c as well. We initially thought of using flags only, but flags variable is used by some other spinlock function in the same function in dpcsup.c so we do not have any other alternative and went with different name. So we used mflags for the management fib count across all the files. 

>
>         /*
>          *      HBA gets first crack
>          */
>
> +       spin_lock_irqsave(&dev->manage_lock, mflags);
> +       if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
> +               printk(KERN_INFO "No management Fibs Available:%d\n",
> +                                               dev->management_fib_count);
> +               spin_unlock_irqrestore(&dev->manage_lock, mflags);
> +               return -EBUSY;
> +       }
> +       spin_unlock_irqrestore(&dev->manage_lock, mflags);
>         status = aac_dev_ioctl(dev, cmd, arg);
> -       if(status != -ENOTTY)
> +       if (status != -ENOTTY)
>                 return status;

It seems to me from the above that the intention of manage_lock is to
protect management_fib_count?  If so, there's a race here, isn't there,
because you check the count under the lock, but then drop the lock which
makes the count vulnerable to change again before you get to
aac_fib_send()?  If you really don't want to go over AAC_NUM_MGT_FIB,
You'll need to do an atomic test and increment somewhere.

<Narasimha Reddy>  We have tested it extensively this in our lab and client as well, we have not seen any race condition here as it is a starting point for any management commands, it is meant only for management commands, and not meant for IO commands. I strongly feel that there would not be any race condition.

For example:

1. Assume driver received a management comment from a application layer, 
2. On receiving, driver checks whether management fib count reached to its limit or not. 
3. If it reached to its limit, then it will return as -EBUSY to the application layer saying that no management fibs are available at this point of time. That means currently, all management fibs are in use. 

Other than the above, the code in here looks fine.

Thanks,

James





^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-09-30 11:33     ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
@ 2009-10-07  6:02       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  0 siblings, 0 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-10-07  6:02 UTC (permalink / raw)
  To: James Bottomley; +Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver


Hi James and linux-scsi community,

Have you had a chance to look at my inline responses to the queries raised by James?

Thanks,
Narasimha Reddy

-----Original Message-----
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai 
Sent: Wednesday, September 30, 2009 5:04 PM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai; James Bottomley
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

Hi James and linux-scsi community,

Sorry for missing a word "not" in the previous mail for the following inline response.

flags is the usual linux convention for this ... and that's what's used
throughout the rest of the file ... consistency on flags would be better
(for all the uses).

<Previous response> I do have any issue to replace "mflags" with "flags", but we have used this mflags in dpcsup.c, and commsup.c as well. We initially thought of using flags only, but flags variable is used by some other spinlock function in the same function in dpcsup.c so we do not have any other alternative and went with different name. So we used mflags for the management fib count across all the files.

<Correct response>
I do not have any issue to replace "mflags" with "flags", but we have used this mflags in dpcsup.c, and commsup.c as well. We initially thought of using flags only, but flags variable is used by some other spinlock function in the same function in dpcsup.c so we do not have any other alternative and went with different name. So we used mflags for the management fib count across all the files.



Thanks,
Narasimha Reddy
 

-----Original Message-----
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai 
Sent: Wednesday, September 30, 2009 4:40 PM
To: James Bottomley
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

Hi James and linux-scsi community,

Thank you very much for your valuable feedback.

Please have a look at my inline responses and let me your opinion so that I will go ahead and do the necessary changes if required.

Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de] 
Sent: Tuesday, September 29, 2009 7:50 PM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: Re: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Tue, 2009-09-29 at 14:17 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Hi James and linux-scsi community,
>
>        I would request you that please review the attached patch and do the needful to incorporate the patch in up-coming release. I am happy to provide if any additional info is required to accept the patch.
>
> From: Narasimha Reddy <ServeRAIDDriver@hcl.in>
>
>         The attached patch was generated for fixing the following issues. These issues were reported by Cisco, SAP and some other customers as well.
>
> Regarding the testing of fixes:
> --------------------------------
>
>         These particular problems were reported by Cisco and SAP and customers as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build to IBM and Cisco. Cisco and IBM tested it for more than 15 days and they reported that they did not see the issue so far. Before the fix, Cisco used to see the issue within 5 days. We generated a patch for SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the patch and gave a test build to SAP. SAP tested and reported that the build is working properly.
>
> We also tested in our lab using the tools "dishogsync", which is IO stress tool and the tool was provided by Cisco.
>
> Issue1: File System going into read-only mode
> ---------
>
> Root cause:
> -----------
>        The driver tends to not free the memory (FIB) when the management request exits prematurely. The accumulation of such un-freed memory causes the driver to fail to allocate anymore memory (FIB) and hence return 0x70000 value to the upper layer, which puts the file system into read only mode.
>
> Fix details:
> ------------
>      The fix makes sure to free the memory (FIB) even if the request exits prematurely hence ensuring the driver wouldn't run out of memory (FIBs).
>
>
> Issue2:
> -------
>         False Raid Alert occurs- when the Physical Drives and Logical drives are reported as deleted or added, even though there is no change done on the system
>
> Root cause:
> -----------
>         Driver IOCTLs is signaled with EINTR while waiting on response from the lower layers. Returning "EINTR" will never initiate internal retry.
>
> Fix details:
> ------------
>         The issue was fixed by replacing "EINTR" with "ERESTARTSYS" for mid-layer retries.
>
>
> Signed-off-by: Narasimha Reddy <ServeRAIDDriver@hcl.in>

Would it be possible to split this into two patches, one that fixes each
issue?  That makes it easier for distro backporting and bisection
checking if something goes wrong with the fix

<Narasimha Reddy> Actually, the issues are inter-linked each other even though customers reported the issues separately. So I feel that it would be better go with as single patch instead of separate patches.

>        Please find the attached patch file "community_aac24701.patch", which has been generated for the issues we fixed and "email" file "community_aac24701.email", which has been generated for the statistics like number of lines inserted and deleted, errors etc.

That's fine ... I note you're using outlook, which is known to mangle
inline patches.

> Regards - Narasimha Reddy
>
>
> DISCLAIMER:
> -----------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
> this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
> received this email in error please delete it and notify the sender immediately. Before opening any mail and
> attachments please check them for viruses and defect.
>
> -----------------------------------------------------------------------------------------------------------------------

It does look a bit odd to have a disclaimer like this on an email you've
just sent to a public list ...

<Narasimha Reddy>:   Yes, I agree with your comment, but the disclaimer will be added by HCL email-server for all outgoing mails outside HCL. I will talk to HCL IT service about this if it is serious issue. Please let me know.


> diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
> --- a/drivers/scsi/aacraid/aachba.c     2009-09-10 03:43:59.000000000 +0530
> +++ b/drivers/scsi/aacraid/aachba.c     2009-09-25 11:43:12.000000000 +0530
> @@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
>                         status = -EINVAL;
>                 }
>         }
> -       aac_fib_complete(fibptr);
> +       /* Do not set XferState to zero unless receives a response from F/W */
> +       if (status >= 0)
> +               aac_fib_complete(fibptr);

This can just move under the if (status >= 0) below, can't it?

<Narasimha Reddy>: I accept your suggestion as we can avoid two times execution of the same condition and redundancy. Earlier there was no check and comment before calling the aac_fib_complete function. We added a check to fix the issues and comment to understand that why we have done the change and it is easy for reviewing when we have comments like this. 

> +
>         /* Send a CT_COMMIT_CONFIG to enable discovery of devices */
>         if (status >= 0) {

[...]


> @@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
>  int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
>  {
>         int status;
> +       unsigned long mflags;

flags is the usual linux convention for this ... and that's what's used
throughout the rest of the file ... consistency on flags would be better
(for all the uses).

<Narasimha Reddy> I do have any issue to replace "mflags" with "flags", but we have used this mflags in dpcsup.c, and commsup.c as well. We initially thought of using flags only, but flags variable is used by some other spinlock function in the same function in dpcsup.c so we do not have any other alternative and went with different name. So we used mflags for the management fib count across all the files. 

>
>         /*
>          *      HBA gets first crack
>          */
>
> +       spin_lock_irqsave(&dev->manage_lock, mflags);
> +       if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
> +               printk(KERN_INFO "No management Fibs Available:%d\n",
> +                                               dev->management_fib_count);
> +               spin_unlock_irqrestore(&dev->manage_lock, mflags);
> +               return -EBUSY;
> +       }
> +       spin_unlock_irqrestore(&dev->manage_lock, mflags);
>         status = aac_dev_ioctl(dev, cmd, arg);
> -       if(status != -ENOTTY)
> +       if (status != -ENOTTY)
>                 return status;

It seems to me from the above that the intention of manage_lock is to
protect management_fib_count?  If so, there's a race here, isn't there,
because you check the count under the lock, but then drop the lock which
makes the count vulnerable to change again before you get to
aac_fib_send()?  If you really don't want to go over AAC_NUM_MGT_FIB,
You'll need to do an atomic test and increment somewhere.

<Narasimha Reddy>  We have tested it extensively this in our lab and client as well, we have not seen any race condition here as it is a starting point for any management commands, it is meant only for management commands, and not meant for IO commands. I strongly feel that there would not be any race condition.

For example:

1. Assume driver received a management comment from a application layer, 
2. On receiving, driver checks whether management fib count reached to its limit or not. 
3. If it reached to its limit, then it will return as -EBUSY to the application layer saying that no management fibs are available at this point of time. That means currently, all management fibs are in use. 

Other than the above, the code in here looks fine.

Thanks,

James





^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-09-30 11:09   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
@ 2009-09-30 11:33     ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-10-07  6:02       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  0 siblings, 1 reply; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-09-30 11:33 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, TLS-Chennai, James Bottomley
  Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

Hi James and linux-scsi community,

Sorry for missing a word "not" in the previous mail for the following inline response.

flags is the usual linux convention for this ... and that's what's used
throughout the rest of the file ... consistency on flags would be better
(for all the uses).

<Previous response> I do have any issue to replace "mflags" with "flags", but we have used this mflags in dpcsup.c, and commsup.c as well. We initially thought of using flags only, but flags variable is used by some other spinlock function in the same function in dpcsup.c so we do not have any other alternative and went with different name. So we used mflags for the management fib count across all the files.

<Correct response>
I do not have any issue to replace "mflags" with "flags", but we have used this mflags in dpcsup.c, and commsup.c as well. We initially thought of using flags only, but flags variable is used by some other spinlock function in the same function in dpcsup.c so we do not have any other alternative and went with different name. So we used mflags for the management fib count across all the files.



Thanks,
Narasimha Reddy
 

-----Original Message-----
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai 
Sent: Wednesday, September 30, 2009 4:40 PM
To: James Bottomley
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

Hi James and linux-scsi community,

Thank you very much for your valuable feedback.

Please have a look at my inline responses and let me your opinion so that I will go ahead and do the necessary changes if required.

Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de] 
Sent: Tuesday, September 29, 2009 7:50 PM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: Re: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Tue, 2009-09-29 at 14:17 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Hi James and linux-scsi community,
>
>        I would request you that please review the attached patch and do the needful to incorporate the patch in up-coming release. I am happy to provide if any additional info is required to accept the patch.
>
> From: Narasimha Reddy <ServeRAIDDriver@hcl.in>
>
>         The attached patch was generated for fixing the following issues. These issues were reported by Cisco, SAP and some other customers as well.
>
> Regarding the testing of fixes:
> --------------------------------
>
>         These particular problems were reported by Cisco and SAP and customers as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build to IBM and Cisco. Cisco and IBM tested it for more than 15 days and they reported that they did not see the issue so far. Before the fix, Cisco used to see the issue within 5 days. We generated a patch for SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the patch and gave a test build to SAP. SAP tested and reported that the build is working properly.
>
> We also tested in our lab using the tools "dishogsync", which is IO stress tool and the tool was provided by Cisco.
>
> Issue1: File System going into read-only mode
> ---------
>
> Root cause:
> -----------
>        The driver tends to not free the memory (FIB) when the management request exits prematurely. The accumulation of such un-freed memory causes the driver to fail to allocate anymore memory (FIB) and hence return 0x70000 value to the upper layer, which puts the file system into read only mode.
>
> Fix details:
> ------------
>      The fix makes sure to free the memory (FIB) even if the request exits prematurely hence ensuring the driver wouldn't run out of memory (FIBs).
>
>
> Issue2:
> -------
>         False Raid Alert occurs- when the Physical Drives and Logical drives are reported as deleted or added, even though there is no change done on the system
>
> Root cause:
> -----------
>         Driver IOCTLs is signaled with EINTR while waiting on response from the lower layers. Returning "EINTR" will never initiate internal retry.
>
> Fix details:
> ------------
>         The issue was fixed by replacing "EINTR" with "ERESTARTSYS" for mid-layer retries.
>
>
> Signed-off-by: Narasimha Reddy <ServeRAIDDriver@hcl.in>

Would it be possible to split this into two patches, one that fixes each
issue?  That makes it easier for distro backporting and bisection
checking if something goes wrong with the fix

<Narasimha Reddy> Actually, the issues are inter-linked each other even though customers reported the issues separately. So I feel that it would be better go with as single patch instead of separate patches.

>        Please find the attached patch file "community_aac24701.patch", which has been generated for the issues we fixed and "email" file "community_aac24701.email", which has been generated for the statistics like number of lines inserted and deleted, errors etc.

That's fine ... I note you're using outlook, which is known to mangle
inline patches.

> Regards - Narasimha Reddy
>
>
> DISCLAIMER:
> -----------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
> this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
> received this email in error please delete it and notify the sender immediately. Before opening any mail and
> attachments please check them for viruses and defect.
>
> -----------------------------------------------------------------------------------------------------------------------

It does look a bit odd to have a disclaimer like this on an email you've
just sent to a public list ...

<Narasimha Reddy>:   Yes, I agree with your comment, but the disclaimer will be added by HCL email-server for all outgoing mails outside HCL. I will talk to HCL IT service about this if it is serious issue. Please let me know.


> diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
> --- a/drivers/scsi/aacraid/aachba.c     2009-09-10 03:43:59.000000000 +0530
> +++ b/drivers/scsi/aacraid/aachba.c     2009-09-25 11:43:12.000000000 +0530
> @@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
>                         status = -EINVAL;
>                 }
>         }
> -       aac_fib_complete(fibptr);
> +       /* Do not set XferState to zero unless receives a response from F/W */
> +       if (status >= 0)
> +               aac_fib_complete(fibptr);

This can just move under the if (status >= 0) below, can't it?

<Narasimha Reddy>: I accept your suggestion as we can avoid two times execution of the same condition and redundancy. Earlier there was no check and comment before calling the aac_fib_complete function. We added a check to fix the issues and comment to understand that why we have done the change and it is easy for reviewing when we have comments like this. 

> +
>         /* Send a CT_COMMIT_CONFIG to enable discovery of devices */
>         if (status >= 0) {

[...]


> @@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
>  int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
>  {
>         int status;
> +       unsigned long mflags;

flags is the usual linux convention for this ... and that's what's used
throughout the rest of the file ... consistency on flags would be better
(for all the uses).

<Narasimha Reddy> I do have any issue to replace "mflags" with "flags", but we have used this mflags in dpcsup.c, and commsup.c as well. We initially thought of using flags only, but flags variable is used by some other spinlock function in the same function in dpcsup.c so we do not have any other alternative and went with different name. So we used mflags for the management fib count across all the files. 

>
>         /*
>          *      HBA gets first crack
>          */
>
> +       spin_lock_irqsave(&dev->manage_lock, mflags);
> +       if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
> +               printk(KERN_INFO "No management Fibs Available:%d\n",
> +                                               dev->management_fib_count);
> +               spin_unlock_irqrestore(&dev->manage_lock, mflags);
> +               return -EBUSY;
> +       }
> +       spin_unlock_irqrestore(&dev->manage_lock, mflags);
>         status = aac_dev_ioctl(dev, cmd, arg);
> -       if(status != -ENOTTY)
> +       if (status != -ENOTTY)
>                 return status;

It seems to me from the above that the intention of manage_lock is to
protect management_fib_count?  If so, there's a race here, isn't there,
because you check the count under the lock, but then drop the lock which
makes the count vulnerable to change again before you get to
aac_fib_send()?  If you really don't want to go over AAC_NUM_MGT_FIB,
You'll need to do an atomic test and increment somewhere.

<Narasimha Reddy>  We have tested it extensively this in our lab and client as well, we have not seen any race condition here as it is a starting point for any management commands, it is meant only for management commands, and not meant for IO commands. I strongly feel that there would not be any race condition.

For example:

1. Assume driver received a management comment from a application layer, 
2. On receiving, driver checks whether management fib count reached to its limit or not. 
3. If it reached to its limit, then it will return as -EBUSY to the application layer saying that no management fibs are available at this point of time. That means currently, all management fibs are in use. 

Other than the above, the code in here looks fine.

Thanks,

James





^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-09-29 14:19 ` James Bottomley
@ 2009-09-30 11:09   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-09-30 11:33     ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  0 siblings, 1 reply; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-09-30 11:09 UTC (permalink / raw)
  To: James Bottomley; +Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

Hi James and linux-scsi community,

Thank you very much for your valuable feedback.

Please have a look at my inline responses and let me your opinion so that I will go ahead and do the necessary changes if required.

Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de]
Sent: Tuesday, September 29, 2009 7:50 PM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: Re: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Tue, 2009-09-29 at 14:17 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Hi James and linux-scsi community,
>
>        I would request you that please review the attached patch and do the needful to incorporate the patch in up-coming release. I am happy to provide if any additional info is required to accept the patch.
>
> From: Narasimha Reddy <ServeRAIDDriver@hcl.in>
>
>         The attached patch was generated for fixing the following issues. These issues were reported by Cisco, SAP and some other customers as well.
>
> Regarding the testing of fixes:
> --------------------------------
>
>         These particular problems were reported by Cisco and SAP and customers as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build to IBM and Cisco. Cisco and IBM tested it for more than 15 days and they reported that they did not see the issue so far. Before the fix, Cisco used to see the issue within 5 days. We generated a patch for SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the patch and gave a test build to SAP. SAP tested and reported that the build is working properly.
>
> We also tested in our lab using the tools "dishogsync", which is IO stress tool and the tool was provided by Cisco.
>
> Issue1: File System going into read-only mode
> ---------
>
> Root cause:
> -----------
>        The driver tends to not free the memory (FIB) when the management request exits prematurely. The accumulation of such un-freed memory causes the driver to fail to allocate anymore memory (FIB) and hence return 0x70000 value to the upper layer, which puts the file system into read only mode.
>
> Fix details:
> ------------
>      The fix makes sure to free the memory (FIB) even if the request exits prematurely hence ensuring the driver wouldn't run out of memory (FIBs).
>
>
> Issue2:
> -------
>         False Raid Alert occurs- when the Physical Drives and Logical drives are reported as deleted or added, even though there is no change done on the system
>
> Root cause:
> -----------
>         Driver IOCTLs is signaled with EINTR while waiting on response from the lower layers. Returning "EINTR" will never initiate internal retry.
>
> Fix details:
> ------------
>         The issue was fixed by replacing "EINTR" with "ERESTARTSYS" for mid-layer retries.
>
>
> Signed-off-by: Narasimha Reddy <ServeRAIDDriver@hcl.in>

Would it be possible to split this into two patches, one that fixes each
issue?  That makes it easier for distro backporting and bisection
checking if something goes wrong with the fix

<Narasimha Reddy> Actually, the issues are inter-linked each other even though customers reported the issues separately. So I feel that it would be better go with as single patch instead of separate patches.

>        Please find the attached patch file "community_aac24701.patch", which has been generated for the issues we fixed and "email" file "community_aac24701.email", which has been generated for the statistics like number of lines inserted and deleted, errors etc.

That's fine ... I note you're using outlook, which is known to mangle
inline patches.

> Regards - Narasimha Reddy
>
>
> DISCLAIMER:
> -----------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
> this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
> received this email in error please delete it and notify the sender immediately. Before opening any mail and
> attachments please check them for viruses and defect.
>
> -----------------------------------------------------------------------------------------------------------------------

It does look a bit odd to have a disclaimer like this on an email you've
just sent to a public list ...

<Narasimha Reddy>:   Yes, I agree with your comment, but the disclaimer will be added by HCL email-server for all outgoing mails outside HCL. I will talk to HCL IT service about this if it is serious issue. Please let me know.


> diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
> --- a/drivers/scsi/aacraid/aachba.c     2009-09-10 03:43:59.000000000 +0530
> +++ b/drivers/scsi/aacraid/aachba.c     2009-09-25 11:43:12.000000000 +0530
> @@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
>                         status = -EINVAL;
>                 }
>         }
> -       aac_fib_complete(fibptr);
> +       /* Do not set XferState to zero unless receives a response from F/W */
> +       if (status >= 0)
> +               aac_fib_complete(fibptr);

This can just move under the if (status >= 0) below, can't it?

<Narasimha Reddy>: I accept your suggestion as we can avoid two times execution of the same condition and redundancy. Earlier there was no check and comment before calling the aac_fib_complete function. We added a check to fix the issues and comment to understand that why we have done the change and it is easy for reviewing when we have comments like this.

> +
>         /* Send a CT_COMMIT_CONFIG to enable discovery of devices */
>         if (status >= 0) {

[...]


> @@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
>  int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
>  {
>         int status;
> +       unsigned long mflags;

flags is the usual linux convention for this ... and that's what's used
throughout the rest of the file ... consistency on flags would be better
(for all the uses).

<Narasimha Reddy> I do have any issue to replace "mflags" with "flags", but we have used this mflags in dpcsup.c, and commsup.c as well. We initially thought of using flags only, but flags variable is used by some other spinlock function in the same function in dpcsup.c so we do not have any other alternative and went with different name. So we used mflags for the management fib count across all the files.

>
>         /*
>          *      HBA gets first crack
>          */
>
> +       spin_lock_irqsave(&dev->manage_lock, mflags);
> +       if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
> +               printk(KERN_INFO "No management Fibs Available:%d\n",
> +                                               dev->management_fib_count);
> +               spin_unlock_irqrestore(&dev->manage_lock, mflags);
> +               return -EBUSY;
> +       }
> +       spin_unlock_irqrestore(&dev->manage_lock, mflags);
>         status = aac_dev_ioctl(dev, cmd, arg);
> -       if(status != -ENOTTY)
> +       if (status != -ENOTTY)
>                 return status;

It seems to me from the above that the intention of manage_lock is to
protect management_fib_count?  If so, there's a race here, isn't there,
because you check the count under the lock, but then drop the lock which
makes the count vulnerable to change again before you get to
aac_fib_send()?  If you really don't want to go over AAC_NUM_MGT_FIB,
You'll need to do an atomic test and increment somewhere.

<Narasimha Reddy>  We have tested it extensively this in our lab and client as well, we have not seen any race condition here as it is a starting point for any management commands, it is meant only for management commands, and not meant for IO commands. I strongly feel that there would not be any race condition.

For example:

1. Assume driver received a management comment from a application layer,
2. On receiving, driver checks whether management fib count reached to its limit or not.
3. If it reached to its limit, then it will return as -EBUSY to the application layer saying that no management fibs are available at this point of time. That means currently, all management fibs are in use.

Other than the above, the code in here looks fine.

Thanks,

James





^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
  2009-09-29  8:47 Penchala Narasimha Reddy Chilakala, TLS-Chennai
@ 2009-09-29 14:19 ` James Bottomley
  2009-09-30 11:09   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-02 17:46 ` James Bottomley
  1 sibling, 1 reply; 40+ messages in thread
From: James Bottomley @ 2009-09-29 14:19 UTC (permalink / raw)
  To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
  Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

On Tue, 2009-09-29 at 14:17 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Hi James and linux-scsi community,
> 
>        I would request you that please review the attached patch and do the needful to incorporate the patch in up-coming release. I am happy to provide if any additional info is required to accept the patch.
> 
> From: Narasimha Reddy <ServeRAIDDriver@hcl.in>
> 
>         The attached patch was generated for fixing the following issues. These issues were reported by Cisco, SAP and some other customers as well.
> 
> Regarding the testing of fixes:
> --------------------------------
> 
>         These particular problems were reported by Cisco and SAP and customers as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build to IBM and Cisco. Cisco and IBM tested it for more than 15 days and they reported that they did not see the issue so far. Before the fix, Cisco used to see the issue within 5 days. We generated a patch for SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the patch and gave a test build to SAP. SAP tested and reported that the build is working properly.
> 
> We also tested in our lab using the tools "dishogsync", which is IO stress tool and the tool was provided by Cisco.
> 
> Issue1: File System going into read-only mode
> ---------
> 
> Root cause:
> -----------
>        The driver tends to not free the memory (FIB) when the management request exits prematurely. The accumulation of such un-freed memory causes the driver to fail to allocate anymore memory (FIB) and hence return 0x70000 value to the upper layer, which puts the file system into read only mode.
> 
> Fix details:
> ------------
>      The fix makes sure to free the memory (FIB) even if the request exits prematurely hence ensuring the driver wouldn't run out of memory (FIBs).
> 
> 
> Issue2:
> -------
>         False Raid Alert occurs- when the Physical Drives and Logical drives are reported as deleted or added, even though there is no change done on the system
> 
> Root cause:
> -----------
>         Driver IOCTLs is signaled with EINTR while waiting on response from the lower layers. Returning "EINTR" will never initiate internal retry.
> 
> Fix details:
> ------------
>         The issue was fixed by replacing "EINTR" with "ERESTARTSYS" for mid-layer retries.
> 
> 
> Signed-off-by: Narasimha Reddy <ServeRAIDDriver@hcl.in>

Would it be possible to split this into two patches, one that fixes each
issue?  That makes it easier for distro backporting and bisection
checking if something goes wrong with the fix

>        Please find the attached patch file "community_aac24701.patch", which has been generated for the issues we fixed and "email" file "community_aac24701.email", which has been generated for the statistics like number of lines inserted and deleted, errors etc.

That's fine ... I note you're using outlook, which is known to mangle
inline patches.

> Regards - Narasimha Reddy
> 
> 
> DISCLAIMER:
> -----------------------------------------------------------------------------------------------------------------------
> 
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
> It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
> this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
> Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
> this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
> received this email in error please delete it and notify the sender immediately. Before opening any mail and 
> attachments please check them for viruses and defect.
> 
> -----------------------------------------------------------------------------------------------------------------------

It does look a bit odd to have a disclaimer like this on an email you've
just sent to a public list ...


> diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
> --- a/drivers/scsi/aacraid/aachba.c     2009-09-10 03:43:59.000000000 +0530
> +++ b/drivers/scsi/aacraid/aachba.c     2009-09-25 11:43:12.000000000 +0530
> @@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
>                         status = -EINVAL;
>                 }
>         }
> -       aac_fib_complete(fibptr);
> +       /* Do not set XferState to zero unless receives a response from F/W */
> +       if (status >= 0)
> +               aac_fib_complete(fibptr);

This can just move under the if (status >= 0) below, can't it?

> +
>         /* Send a CT_COMMIT_CONFIG to enable discovery of devices */
>         if (status >= 0) {

[...]


> @@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
>  int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
>  {
>         int status;
> +       unsigned long mflags;

flags is the usual linux convention for this ... and that's what's used
throughout the rest of the file ... consistency on flags would be better
(for all the uses).

>  
>         /*
>          *      HBA gets first crack
>          */
>  
> +       spin_lock_irqsave(&dev->manage_lock, mflags);
> +       if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
> +               printk(KERN_INFO "No management Fibs Available:%d\n",
> +                                               dev->management_fib_count);
> +               spin_unlock_irqrestore(&dev->manage_lock, mflags);
> +               return -EBUSY;
> +       }
> +       spin_unlock_irqrestore(&dev->manage_lock, mflags);
>         status = aac_dev_ioctl(dev, cmd, arg);
> -       if(status != -ENOTTY)
> +       if (status != -ENOTTY)
>                 return status;

It seems to me from the above that the intention of manage_lock is to
protect management_fib_count?  If so, there's a race here, isn't there,
because you check the count under the lock, but then drop the lock which
makes the count vulnerable to change again before you get to
aac_fib_send()?  If you really don't want to go over AAC_NUM_MGT_FIB,
you'll need to do an atomic test and increment somewhere.

Other than the above, the code in here looks fine.

Thanks,

James




^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
@ 2009-09-29  8:47 Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-09-29 14:19 ` James Bottomley
  2009-11-02 17:46 ` James Bottomley
  0 siblings, 2 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-09-29  8:47 UTC (permalink / raw)
  To: 'linux-scsi, James Bottomley; +Cc: ServeRAID Driver

[-- Attachment #1: Type: text/plain, Size: 3641 bytes --]

Hi James and linux-scsi community,

       I would request you that please review the attached patch and do the needful to incorporate the patch in up-coming release. I am happy to provide if any additional info is required to accept the patch.

From: Narasimha Reddy <ServeRAIDDriver@hcl.in>

        The attached patch was generated for fixing the following issues. These issues were reported by Cisco, SAP and some other customers as well.

Regarding the testing of fixes:
--------------------------------

        These particular problems were reported by Cisco and SAP and customers as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build to IBM and Cisco. Cisco and IBM tested it for more than 15 days and they reported that they did not see the issue so far. Before the fix, Cisco used to see the issue within 5 days. We generated a patch for SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the patch and gave a test build to SAP. SAP tested and reported that the build is working properly.

We also tested in our lab using the tools "dishogsync", which is IO stress tool and the tool was provided by Cisco.

Issue1: File System going into read-only mode
---------

Root cause:
-----------
       The driver tends to not free the memory (FIB) when the management request exits prematurely. The accumulation of such un-freed memory causes the driver to fail to allocate anymore memory (FIB) and hence return 0x70000 value to the upper layer, which puts the file system into read only mode.

Fix details:
------------
     The fix makes sure to free the memory (FIB) even if the request exits prematurely hence ensuring the driver wouldn't run out of memory (FIBs).


Issue2:
-------
        False Raid Alert occurs- when the Physical Drives and Logical drives are reported as deleted or added, even though there is no change done on the system

Root cause:
-----------
        Driver IOCTLs is signaled with EINTR while waiting on response from the lower layers. Returning "EINTR" will never initiate internal retry.

Fix details:
------------
        The issue was fixed by replacing "EINTR" with "ERESTARTSYS" for mid-layer retries.


Signed-off-by: Narasimha Reddy <ServeRAIDDriver@hcl.in>

       Please find the attached patch file "community_aac24701.patch", which has been generated for the issues we fixed and "email" file "community_aac24701.email", which has been generated for the statistics like number of lines inserted and deleted, errors etc.

Regards - Narasimha Reddy


DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

[-- Attachment #2: community_aac24701.email --]
[-- Type: application/octet-stream, Size: 690 bytes --]


This attached patch is against current scsi-misc-2.6.

ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments.

Signed-off-by: Penchala Narasimha Reddy <ServeRAIDDriver@hcl.in>

 drivers/scsi/aacraid/aachba.c   |   52 ++++++++++++++++++++++++++--------
 drivers/scsi/aacraid/aacraid.h  |    5 ++-
 drivers/scsi/aacraid/commctrl.c |   37 +++++++++++++++---------
 drivers/scsi/aacraid/comminit.c |    6 +++
 drivers/scsi/aacraid/commsup.c  |   61 ++++++++++++++++++++++++++++++++--------
 drivers/scsi/aacraid/dpcsup.c   |   36 +++++++++++++++++++----
 6 files changed, 151 insertions(+), 46 deletions(-)

Sincerely - Penchala Narasimha Reddy 

[-- Attachment #3: community_aac24701.patch --]
[-- Type: application/octet-stream, Size: 14698 bytes --]

diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
--- a/drivers/scsi/aacraid/aachba.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/aachba.c	2009-09-25 11:43:12.000000000 +0530
@@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
 			status = -EINVAL;
 		}
 	}
-	aac_fib_complete(fibptr);
+	/* Do not set XferState to zero unless receives a response from F/W */
+	if (status >= 0)
+		aac_fib_complete(fibptr);
+
 	/* Send a CT_COMMIT_CONFIG to enable discovery of devices */
 	if (status >= 0) {
 		if ((aac_commit == 1) || commit_flag) {
@@ -310,13 +313,18 @@ int aac_get_config_status(struct aac_dev
 				    FsaNormal,
 				    1, 1,
 				    NULL, NULL);
-			aac_fib_complete(fibptr);
+			/* Do not set XferState to zero unless
+			 * receives a response from F/W */
+			if (status >= 0)
+				aac_fib_complete(fibptr);
 		} else if (aac_commit == 0) {
 			printk(KERN_WARNING
 			  "aac_get_config_status: Foreign device configurations are being ignored\n");
 		}
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 	return status;
 }
 
@@ -355,7 +363,9 @@ int aac_get_containers(struct aac_dev *d
 		maximum_num_containers = le32_to_cpu(dresp->ContainerSwitchEntries);
 		aac_fib_complete(fibptr);
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 
 	if (maximum_num_containers < MAXIMUM_NUM_CONTAINERS)
 		maximum_num_containers = MAXIMUM_NUM_CONTAINERS;
@@ -1245,8 +1255,12 @@ int aac_get_adapter_info(struct aac_dev*
 			 NULL);
 
 	if (rcode < 0) {
-		aac_fib_complete(fibptr);
-		aac_fib_free(fibptr);
+		/* FIB should be freed only after
+		 * getting the response from the F/W */
+		if (rcode != -ERESTARTSYS) {
+			aac_fib_complete(fibptr);
+			aac_fib_free(fibptr);
+		}
 		return rcode;
 	}
 	memcpy(&dev->adapter_info, info, sizeof(*info));
@@ -1270,6 +1284,12 @@ int aac_get_adapter_info(struct aac_dev*
 
 		if (rcode >= 0)
 			memcpy(&dev->supplement_adapter_info, sinfo, sizeof(*sinfo));
+		if (rcode == -ERESTARTSYS) {
+			fibptr = aac_fib_alloc(dev);
+			if (!fibptr)
+				return -ENOMEM;
+		}
+
 	}
 
 
@@ -1470,9 +1490,11 @@ int aac_get_adapter_info(struct aac_dev*
 			  (dev->scsi_host_ptr->sg_tablesize * 8) + 112;
 		}
 	}
-
-	aac_fib_complete(fibptr);
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (rcode != -ERESTARTSYS) {
+		aac_fib_complete(fibptr);
+		aac_fib_free(fibptr);
+	}
 
 	return rcode;
 }
@@ -1633,6 +1655,7 @@ static int aac_read(struct scsi_cmnd * s
 	 *	Alocate and initialize a Fib
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
+		printk(KERN_WARNING "aac_read: fib allocation failed\n");
 		return -1;
 	}
 
@@ -1712,9 +1735,14 @@ static int aac_write(struct scsi_cmnd * 
 	 *	Allocate and initialize a Fib then setup a BlockWrite command
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
-		scsicmd->result = DID_ERROR << 16;
-		scsicmd->scsi_done(scsicmd);
-		return 0;
+		/* FIB temporarily unavailable,not catastrophic failure */
+
+		/* scsicmd->result = DID_ERROR << 16;
+		 * scsicmd->scsi_done(scsicmd);
+		 * return 0;
+		 */
+		printk(KERN_WARNING "aac_write: fib allocation failed\n");
+		return -1;
 	}
 
 	status = aac_adapter_write(cmd_fibcontext, scsicmd, lba, count, fua);
diff -purN a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
--- a/drivers/scsi/aacraid/aacraid.h	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/aacraid.h	2009-09-25 12:13:55.000000000 +0530
@@ -12,7 +12,7 @@
  *----------------------------------------------------------------------------*/
 
 #ifndef AAC_DRIVER_BUILD
-# define AAC_DRIVER_BUILD 2461
+# define AAC_DRIVER_BUILD 24701
 # define AAC_DRIVER_BRANCH "-ms"
 #endif
 #define MAXIMUM_NUM_CONTAINERS	32
@@ -1036,6 +1036,9 @@ struct aac_dev
 	u8			printf_enabled;
 	u8			in_reset;
 	u8			msi;
+	int			management_fib_count;
+	spinlock_t		manage_lock;
+
 };
 
 #define aac_adapter_interrupt(dev) \
diff -purN a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
--- a/drivers/scsi/aacraid/commctrl.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/commctrl.c	2009-09-25 12:07:44.000000000 +0530
@@ -153,7 +153,7 @@ cleanup:
 		fibptr->hw_fib_pa = hw_fib_pa;
 		fibptr->hw_fib_va = hw_fib;
 	}
-	if (retval != -EINTR)
+	if (retval != -ERESTARTSYS)
 		aac_fib_free(fibptr);
 	return retval;
 }
@@ -322,7 +322,7 @@ return_fib:
 		}
 		if (f.wait) {
 			if(down_interruptible(&fibctx->wait_sem) < 0) {
-				status = -EINTR;
+				status = -ERESTARTSYS;
 			} else {
 				/* Lock again and retry */
 				spin_lock_irqsave(&dev->fib_lock, flags);
@@ -593,10 +593,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -645,10 +645,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -695,10 +695,10 @@ static int aac_send_raw_srb(struct aac_d
 				uintptr_t addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -734,10 +734,10 @@ static int aac_send_raw_srb(struct aac_d
 				dma_addr_t addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -772,8 +772,8 @@ static int aac_send_raw_srb(struct aac_d
 		psg->count = cpu_to_le32(sg_indx+1);
 		status = aac_fib_send(ScsiPortCommand, srbfib, actual_fibsize, FsaNormal, 1, 1, NULL, NULL);
 	}
-	if (status == -EINTR) {
-		rcode = -EINTR;
+	if (status == -ERESTARTSYS) {
+		rcode = -ERESTARTSYS;
 		goto cleanup;
 	}
 
@@ -810,7 +810,7 @@ cleanup:
 	for(i=0; i <= sg_indx; i++){
 		kfree(sg_list[i]);
 	}
-	if (rcode != -EINTR) {
+	if (rcode != -ERESTARTSYS) {
 		aac_fib_complete(srbfib);
 		aac_fib_free(srbfib);
 	}
@@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
 int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
 {
 	int status;
+	unsigned long mflags;
 
 	/*
 	 *	HBA gets first crack
 	 */
 
+	spin_lock_irqsave(&dev->manage_lock, mflags);
+	if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
+		printk(KERN_INFO "No management Fibs Available:%d\n",
+						dev->management_fib_count);
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
+		return -EBUSY;
+	}
+	spin_unlock_irqrestore(&dev->manage_lock, mflags);
 	status = aac_dev_ioctl(dev, cmd, arg);
-	if(status != -ENOTTY)
+	if (status != -ENOTTY)
 		return status;
 
 	switch (cmd) {
diff -purN a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
--- a/drivers/scsi/aacraid/comminit.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/comminit.c	2009-09-25 11:54:23.000000000 +0530
@@ -194,7 +194,9 @@ int aac_send_shutdown(struct aac_dev * d
 
 	if (status >= 0)
 		aac_fib_complete(fibctx);
-	aac_fib_free(fibctx);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibctx);
 	return status;
 }
 
@@ -304,6 +306,8 @@ struct aac_dev *aac_init_adapter(struct 
 	/*
 	 *	Check the preferred comm settings, defaults from template.
 	 */
+	dev->management_fib_count = 0;
+	spin_lock_init(&dev->manage_lock);
 	dev->max_fib_size = sizeof(struct hw_fib);
 	dev->sg_tablesize = host->sg_tablesize = (dev->max_fib_size
 		- sizeof(struct aac_fibhdr)
diff -purN a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
--- a/drivers/scsi/aacraid/commsup.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/commsup.c	2009-09-25 11:57:00.000000000 +0530
@@ -189,7 +189,14 @@ struct fib *aac_fib_alloc(struct aac_dev
 
 void aac_fib_free(struct fib *fibptr)
 {
-	unsigned long flags;
+	unsigned long flags, flagsv;
+
+	spin_lock_irqsave(&fibptr->event_lock, flagsv);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
+		return;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
 
 	spin_lock_irqsave(&fibptr->dev->fib_lock, flags);
 	if (unlikely(fibptr->flags & FIB_CONTEXT_FLAG_TIMED_OUT))
@@ -473,14 +480,27 @@ int aac_fib_send(u16 command, struct fib
 
 	if(wait)
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-	aac_adapter_deliver(fibptr);
+
+	if (aac_adapter_deliver(fibptr) != 0) {
+		printk(KERN_ERR "aac_fib_send: returned -EBUSY\n");
+		if (wait)
+			spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return -EBUSY;
+	}
+
 
 	/*
 	 *	If the caller wanted us to wait for response wait now.
 	 */
 
 	if (wait) {
+		unsigned long mflags;
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
+		spin_lock_irqsave(&dev->manage_lock, mflags);
+		dev->management_fib_count++;
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 		/* Only set for first known interruptable command */
 		if (wait < 0) {
 			/*
@@ -515,15 +535,14 @@ int aac_fib_send(u16 command, struct fib
 				}
 				udelay(5);
 			}
-		} else if (down_interruptible(&fibptr->event_wait)) {
-			fibptr->done = 2;
-			up(&fibptr->event_wait);
-		}
+		} else
+			down_interruptible(&fibptr->event_wait);
+
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-		if ((fibptr->done == 0) || (fibptr->done == 2)) {
+		if (fibptr->done == 0) {
 			fibptr->done = 2; /* Tell interrupt we aborted */
 			spin_unlock_irqrestore(&fibptr->event_lock, flags);
-			return -EINTR;
+			return -ERESTARTSYS;
 		}
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
 		BUG_ON(fibptr->done == 0);
@@ -689,6 +708,7 @@ int aac_fib_adapter_complete(struct fib 
 
 int aac_fib_complete(struct fib *fibptr)
 {
+	unsigned long flags;
 	struct hw_fib * hw_fib = fibptr->hw_fib_va;
 
 	/*
@@ -709,6 +729,13 @@ int aac_fib_complete(struct fib *fibptr)
 	 *	command is complete that we had sent to the adapter and this
 	 *	cdb could be reused.
 	 */
+	spin_lock_irqsave(&fibptr->event_lock, flags);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return 0;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
 	if((hw_fib->header.XferState & cpu_to_le32(SentFromHost)) &&
 		(hw_fib->header.XferState & cpu_to_le32(AdapterProcessed)))
 	{
@@ -1355,7 +1382,10 @@ int aac_reset_adapter(struct aac_dev * a
 
 			if (status >= 0)
 				aac_fib_complete(fibctx);
-			aac_fib_free(fibctx);
+			/* FIB should be freed only after getting
+			 * the response from the F/W */
+			if (status != -ERESTARTSYS)
+				aac_fib_free(fibctx);
 		}
 	}
 
@@ -1759,6 +1789,7 @@ int aac_command_thread(void *data)
 				struct fib *fibptr;
 
 				if ((fibptr = aac_fib_alloc(dev))) {
+					int status;
 					__le32 *info;
 
 					aac_fib_init(fibptr);
@@ -1769,15 +1800,21 @@ int aac_command_thread(void *data)
 
 					*info = cpu_to_le32(now.tv_sec);
 
-					(void)aac_fib_send(SendHostTime,
+					status = aac_fib_send(SendHostTime,
 						fibptr,
 						sizeof(*info),
 						FsaNormal,
 						1, 1,
 						NULL,
 						NULL);
-					aac_fib_complete(fibptr);
-					aac_fib_free(fibptr);
+					/* Do not set XferState to zero unless
+					 * receives a response from F/W */
+					if (status >= 0)
+						aac_fib_complete(fibptr);
+					/* FIB should be freed only after
+					 * getting the response from the F/W */
+					if (status != -ERESTARTSYS)
+						aac_fib_free(fibptr);
 				}
 				difference = (long)(unsigned)update_interval*HZ;
 			} else {
diff -purN a/drivers/scsi/aacraid/dpcsup.c b/drivers/scsi/aacraid/dpcsup.c
--- a/drivers/scsi/aacraid/dpcsup.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/dpcsup.c	2009-09-25 12:08:44.000000000 +0530
@@ -57,9 +57,9 @@ unsigned int aac_response_normal(struct 
 	struct hw_fib * hwfib;
 	struct fib * fib;
 	int consumed = 0;
-	unsigned long flags;
+	unsigned long flags, mflags;
 
-	spin_lock_irqsave(q->lock, flags);	
+	spin_lock_irqsave(q->lock, flags);
 	/*
 	 *	Keep pulling response QEs off the response queue and waking
 	 *	up the waiters until there are no more QEs. We then return
@@ -125,12 +125,21 @@ unsigned int aac_response_normal(struct 
 		} else {
 			unsigned long flagv;
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
 			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
 				aac_fib_complete(fib);
 				aac_fib_free(fib);
 			}
@@ -232,6 +241,7 @@ unsigned int aac_command_normal(struct a
 
 unsigned int aac_intr_normal(struct aac_dev * dev, u32 index)
 {
+	unsigned long mflags;
 	dprintk((KERN_INFO "aac_intr_normal(%p,%x)\n", dev, index));
 	if ((index & 0x00000002L)) {
 		struct hw_fib * hw_fib;
@@ -320,11 +330,25 @@ unsigned int aac_intr_normal(struct aac_
 			unsigned long flagv;
 	  		dprintk((KERN_INFO "event_wait up\n"));
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
+			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
+				aac_fib_complete(fib);
+				aac_fib_free(fib);
+			}
+
 		}
 		return 0;
 	}

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2009-12-24  6:31 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-02  7:35 [PATCH ] scsi-misc-2.6: File System going into read-only mode Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-02 16:30 ` Stefan Richter
2009-11-03  9:54   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-03 19:02     ` Stefan Richter
2009-11-04  7:00       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-04  9:18       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-04 14:48         ` Stefan Richter
2009-11-02 17:45 ` James Bottomley
2009-11-03  9:38   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-03 15:08     ` James Bottomley
2009-11-04  7:07       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-04 15:42         ` James Bottomley
  -- strict thread matches above, loose matches on Subject: below --
2009-12-24  6:29 Penchala Narasimha Reddy Chilakala, ERS-HCLTech
2009-12-21 13:09 Penchala Narasimha Reddy Chilakala, ERS-HCLTech
2009-12-18 12:25 Penchala Narasimha Reddy Chilakala, ERS-HCLTech
2009-11-05 13:27 Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-11 19:26 ` James Bottomley
2009-11-12  8:28   ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
2009-11-13 20:55     ` James Bottomley
2009-11-16  3:25       ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
2009-11-16 15:51         ` James Bottomley
2009-11-17  5:22           ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
2009-11-17  5:27           ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
2009-10-09  9:23 Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-10-22  1:09 ` James Bottomley
2009-10-23 13:10   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-09-29  8:47 Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-09-29 14:19 ` James Bottomley
2009-09-30 11:09   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-09-30 11:33     ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-10-07  6:02       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-02 17:46 ` James Bottomley
2009-11-03 10:17   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-03 14:40     ` James Bottomley
2009-11-04  1:23       ` Stefan Richter
2009-11-04  1:33         ` Stefan Richter
2009-11-04  1:40           ` Stefan Richter
2009-11-04  6:44         ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-04  7:21       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-04 15:30         ` James Bottomley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.