All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
@ 2009-11-05 13:27 Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-11 19:26 ` James Bottomley
  0 siblings, 1 reply; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-11-05 13:27 UTC (permalink / raw)
  To: 'linux-scsi, James Bottomley; +Cc: ServeRAID Driver

[-- Attachment #1: Type: text/plain, Size: 3797 bytes --]

Hi James and linux-scsi community,

       I would request you that please review the attached updated patch with the following modification, which is suggested by James in the previous mails and do the needful to incorporate the patch in up-coming release.


else if (down_interruptible(&fibptr->event_wait)) {
        /* Do nothing ... satisfy
         * down_interruptible must_check */
 }


From: Narasimha Reddy <ServeRAIDDriver@hcl.in>

        The attached patch was generated for fixing the following issues. These issues were reported by Cisco, SAP and some other customers as well.

Regarding the testing of fixes:
--------------------------------

        These particular problems were reported by Cisco and SAP and customers as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build to IBM and Cisco. Cisco and IBM tested it for more than 15 days and they reported that they did not see the issue so far. Before the fix, Cisco used to see the issue within 5 days. We generated a patch for SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the patch and gave a test build to SAP. SAP tested and reported that the build is working properly.

We also tested in our lab using the tools "dishogsync", which is IO stress tool and the tool was provided by Cisco.

Issue1: File System going into read-only mode
---------

Root cause:
-----------
       The driver tends to not free the memory (FIB) when the management request exits prematurely. The accumulation of such un-freed memory causes the driver to fail to allocate anymore memory (FIB) and hence return 0x70000 value to the upper layer, which puts the file system into read only mode.

Fix details:
------------
     The fix makes sure to free the memory (FIB) even if the request exits prematurely hence ensuring the driver wouldn't run out of memory (FIBs).


Issue2:
-------
        False Raid Alert occurs- when the Physical Drives and Logical drives are reported as deleted or added, even though there is no change done on the system

Root cause:
-----------
        Driver IOCTLs is signaled with EINTR while waiting on response from the lower layers. Returning "EINTR" will never initiate internal retry.

Fix details:
------------
        The issue was fixed by replacing "EINTR" with "ERESTARTSYS" for mid-layer retries.


Signed-off-by: Narasimha Reddy <ServeRAIDDriver@hcl.in>

       Please find the attached patch file "community_aac24701.patch", which has been generated for the issues we fixed and "email" file "community_aac24701.email", which has been generated for the statistics like number of lines inserted and deleted, errors etc.

Regards - Narasimha Reddy


DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

[-- Attachment #2: community_aac24701.patch --]
[-- Type: application/octet-stream, Size: 14706 bytes --]

diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
--- a/drivers/scsi/aacraid/aachba.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/aachba.c	2009-09-25 11:43:12.000000000 +0530
@@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
 			status = -EINVAL;
 		}
 	}
-	aac_fib_complete(fibptr);
+	/* Do not set XferState to zero unless receives a response from F/W */
+	if (status >= 0)
+		aac_fib_complete(fibptr);
+
 	/* Send a CT_COMMIT_CONFIG to enable discovery of devices */
 	if (status >= 0) {
 		if ((aac_commit == 1) || commit_flag) {
@@ -310,13 +313,18 @@ int aac_get_config_status(struct aac_dev
 				    FsaNormal,
 				    1, 1,
 				    NULL, NULL);
-			aac_fib_complete(fibptr);
+			/* Do not set XferState to zero unless
+			 * receives a response from F/W */
+			if (status >= 0)
+				aac_fib_complete(fibptr);
 		} else if (aac_commit == 0) {
 			printk(KERN_WARNING
 			  "aac_get_config_status: Foreign device configurations are being ignored\n");
 		}
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 	return status;
 }
 
@@ -355,7 +363,9 @@ int aac_get_containers(struct aac_dev *d
 		maximum_num_containers = le32_to_cpu(dresp->ContainerSwitchEntries);
 		aac_fib_complete(fibptr);
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 
 	if (maximum_num_containers < MAXIMUM_NUM_CONTAINERS)
 		maximum_num_containers = MAXIMUM_NUM_CONTAINERS;
@@ -1245,8 +1255,12 @@ int aac_get_adapter_info(struct aac_dev*
 			 NULL);
 
 	if (rcode < 0) {
-		aac_fib_complete(fibptr);
-		aac_fib_free(fibptr);
+		/* FIB should be freed only after
+		 * getting the response from the F/W */
+		if (rcode != -ERESTARTSYS) {
+			aac_fib_complete(fibptr);
+			aac_fib_free(fibptr);
+		}
 		return rcode;
 	}
 	memcpy(&dev->adapter_info, info, sizeof(*info));
@@ -1270,6 +1284,12 @@ int aac_get_adapter_info(struct aac_dev*
 
 		if (rcode >= 0)
 			memcpy(&dev->supplement_adapter_info, sinfo, sizeof(*sinfo));
+		if (rcode == -ERESTARTSYS) {
+			fibptr = aac_fib_alloc(dev);
+			if (!fibptr)
+				return -ENOMEM;
+		}
+
 	}
 
 
@@ -1470,9 +1490,11 @@ int aac_get_adapter_info(struct aac_dev*
 			  (dev->scsi_host_ptr->sg_tablesize * 8) + 112;
 		}
 	}
-
-	aac_fib_complete(fibptr);
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (rcode != -ERESTARTSYS) {
+		aac_fib_complete(fibptr);
+		aac_fib_free(fibptr);
+	}
 
 	return rcode;
 }
@@ -1633,6 +1655,7 @@ static int aac_read(struct scsi_cmnd * s
 	 *	Alocate and initialize a Fib
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
+		printk(KERN_WARNING "aac_read: fib allocation failed\n");
 		return -1;
 	}
 
@@ -1712,9 +1735,14 @@ static int aac_write(struct scsi_cmnd * 
 	 *	Allocate and initialize a Fib then setup a BlockWrite command
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
-		scsicmd->result = DID_ERROR << 16;
-		scsicmd->scsi_done(scsicmd);
-		return 0;
+		/* FIB temporarily unavailable,not catastrophic failure */
+
+		/* scsicmd->result = DID_ERROR << 16;
+		 * scsicmd->scsi_done(scsicmd);
+		 * return 0;
+		 */
+		printk(KERN_WARNING "aac_write: fib allocation failed\n");
+		return -1;
 	}
 
 	status = aac_adapter_write(cmd_fibcontext, scsicmd, lba, count, fua);
diff -purN a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
--- a/drivers/scsi/aacraid/aacraid.h	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/aacraid.h	2009-09-25 12:13:55.000000000 +0530
@@ -12,7 +12,7 @@
  *----------------------------------------------------------------------------*/
 
 #ifndef AAC_DRIVER_BUILD
-# define AAC_DRIVER_BUILD 2461
+# define AAC_DRIVER_BUILD 24701
 # define AAC_DRIVER_BRANCH "-ms"
 #endif
 #define MAXIMUM_NUM_CONTAINERS	32
@@ -1036,6 +1036,9 @@ struct aac_dev
 	u8			printf_enabled;
 	u8			in_reset;
 	u8			msi;
+	int			management_fib_count;
+	spinlock_t		manage_lock;
+
 };
 
 #define aac_adapter_interrupt(dev) \
diff -purN a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
--- a/drivers/scsi/aacraid/commctrl.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/commctrl.c	2009-09-25 12:07:44.000000000 +0530
@@ -153,7 +153,7 @@ cleanup:
 		fibptr->hw_fib_pa = hw_fib_pa;
 		fibptr->hw_fib_va = hw_fib;
 	}
-	if (retval != -EINTR)
+	if (retval != -ERESTARTSYS)
 		aac_fib_free(fibptr);
 	return retval;
 }
@@ -322,7 +322,7 @@ return_fib:
 		}
 		if (f.wait) {
 			if(down_interruptible(&fibctx->wait_sem) < 0) {
-				status = -EINTR;
+				status = -ERESTARTSYS;
 			} else {
 				/* Lock again and retry */
 				spin_lock_irqsave(&dev->fib_lock, flags);
@@ -593,10 +593,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -645,10 +645,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -695,10 +695,10 @@ static int aac_send_raw_srb(struct aac_d
 				uintptr_t addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -734,10 +734,10 @@ static int aac_send_raw_srb(struct aac_d
 				dma_addr_t addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -772,8 +772,8 @@ static int aac_send_raw_srb(struct aac_d
 		psg->count = cpu_to_le32(sg_indx+1);
 		status = aac_fib_send(ScsiPortCommand, srbfib, actual_fibsize, FsaNormal, 1, 1, NULL, NULL);
 	}
-	if (status == -EINTR) {
-		rcode = -EINTR;
+	if (status == -ERESTARTSYS) {
+		rcode = -ERESTARTSYS;
 		goto cleanup;
 	}
 
@@ -810,7 +810,7 @@ cleanup:
 	for(i=0; i <= sg_indx; i++){
 		kfree(sg_list[i]);
 	}
-	if (rcode != -EINTR) {
+	if (rcode != -ERESTARTSYS) {
 		aac_fib_complete(srbfib);
 		aac_fib_free(srbfib);
 	}
@@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
 int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
 {
 	int status;
+	unsigned long mflags;
 
 	/*
 	 *	HBA gets first crack
 	 */
 
+	spin_lock_irqsave(&dev->manage_lock, mflags);
+	if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
+		printk(KERN_INFO "No management Fibs Available:%d\n",
+						dev->management_fib_count);
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
+		return -EBUSY;
+	}
+	spin_unlock_irqrestore(&dev->manage_lock, mflags);
 	status = aac_dev_ioctl(dev, cmd, arg);
-	if(status != -ENOTTY)
+	if (status != -ENOTTY)
 		return status;
 
 	switch (cmd) {
diff -purN a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
--- a/drivers/scsi/aacraid/comminit.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/comminit.c	2009-09-25 11:54:23.000000000 +0530
@@ -194,7 +194,9 @@ int aac_send_shutdown(struct aac_dev * d
 
 	if (status >= 0)
 		aac_fib_complete(fibctx);
-	aac_fib_free(fibctx);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibctx);
 	return status;
 }
 
@@ -304,6 +306,8 @@ struct aac_dev *aac_init_adapter(struct 
 	/*
 	 *	Check the preferred comm settings, defaults from template.
 	 */
+	dev->management_fib_count = 0;
+	spin_lock_init(&dev->manage_lock);
 	dev->max_fib_size = sizeof(struct hw_fib);
 	dev->sg_tablesize = host->sg_tablesize = (dev->max_fib_size
 		- sizeof(struct aac_fibhdr)
diff -purN a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
--- a/drivers/scsi/aacraid/commsup.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/commsup.c	2009-11-05 16:49:13.000000000 +0530
@@ -189,7 +189,14 @@ struct fib *aac_fib_alloc(struct aac_dev
 
 void aac_fib_free(struct fib *fibptr)
 {
-	unsigned long flags;
+	unsigned long flags, flagsv;
+
+	spin_lock_irqsave(&fibptr->event_lock, flagsv);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
+		return;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
 
 	spin_lock_irqsave(&fibptr->dev->fib_lock, flags);
 	if (unlikely(fibptr->flags & FIB_CONTEXT_FLAG_TIMED_OUT))
@@ -473,14 +480,27 @@ int aac_fib_send(u16 command, struct fib
 
 	if(wait)
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-	aac_adapter_deliver(fibptr);
+
+	if (aac_adapter_deliver(fibptr) != 0) {
+		printk(KERN_ERR "aac_fib_send: returned -EBUSY\n");
+		if (wait)
+			spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return -EBUSY;
+	}
+
 
 	/*
 	 *	If the caller wanted us to wait for response wait now.
 	 */
 
 	if (wait) {
+		unsigned long mflags;
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
+		spin_lock_irqsave(&dev->manage_lock, mflags);
+		dev->management_fib_count++;
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 		/* Only set for first known interruptable command */
 		if (wait < 0) {
 			/*
@@ -516,14 +536,15 @@ int aac_fib_send(u16 command, struct fib
 				udelay(5);
 			}
 		} else if (down_interruptible(&fibptr->event_wait)) {
-			fibptr->done = 2;
-			up(&fibptr->event_wait);
+			/* Do nothing ... satisfy
+			 * down_interruptible must_check */
 		}
+
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-		if ((fibptr->done == 0) || (fibptr->done == 2)) {
+		if (fibptr->done == 0) {
 			fibptr->done = 2; /* Tell interrupt we aborted */
 			spin_unlock_irqrestore(&fibptr->event_lock, flags);
-			return -EINTR;
+			return -ERESTARTSYS;
 		}
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
 		BUG_ON(fibptr->done == 0);
@@ -689,6 +710,7 @@ int aac_fib_adapter_complete(struct fib 
 
 int aac_fib_complete(struct fib *fibptr)
 {
+	unsigned long flags;
 	struct hw_fib * hw_fib = fibptr->hw_fib_va;
 
 	/*
@@ -709,6 +731,13 @@ int aac_fib_complete(struct fib *fibptr)
 	 *	command is complete that we had sent to the adapter and this
 	 *	cdb could be reused.
 	 */
+	spin_lock_irqsave(&fibptr->event_lock, flags);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return 0;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
 	if((hw_fib->header.XferState & cpu_to_le32(SentFromHost)) &&
 		(hw_fib->header.XferState & cpu_to_le32(AdapterProcessed)))
 	{
@@ -1355,7 +1384,10 @@ int aac_reset_adapter(struct aac_dev * a
 
 			if (status >= 0)
 				aac_fib_complete(fibctx);
-			aac_fib_free(fibctx);
+			/* FIB should be freed only after getting
+			 * the response from the F/W */
+			if (status != -ERESTARTSYS)
+				aac_fib_free(fibctx);
 		}
 	}
 
@@ -1759,6 +1791,7 @@ int aac_command_thread(void *data)
 				struct fib *fibptr;
 
 				if ((fibptr = aac_fib_alloc(dev))) {
+					int status;
 					__le32 *info;
 
 					aac_fib_init(fibptr);
@@ -1769,15 +1802,21 @@ int aac_command_thread(void *data)
 
 					*info = cpu_to_le32(now.tv_sec);
 
-					(void)aac_fib_send(SendHostTime,
+					status = aac_fib_send(SendHostTime,
 						fibptr,
 						sizeof(*info),
 						FsaNormal,
 						1, 1,
 						NULL,
 						NULL);
-					aac_fib_complete(fibptr);
-					aac_fib_free(fibptr);
+					/* Do not set XferState to zero unless
+					 * receives a response from F/W */
+					if (status >= 0)
+						aac_fib_complete(fibptr);
+					/* FIB should be freed only after
+					 * getting the response from the F/W */
+					if (status != -ERESTARTSYS)
+						aac_fib_free(fibptr);
 				}
 				difference = (long)(unsigned)update_interval*HZ;
 			} else {
diff -purN a/drivers/scsi/aacraid/dpcsup.c b/drivers/scsi/aacraid/dpcsup.c
--- a/drivers/scsi/aacraid/dpcsup.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/dpcsup.c	2009-09-25 12:08:44.000000000 +0530
@@ -57,9 +57,9 @@ unsigned int aac_response_normal(struct 
 	struct hw_fib * hwfib;
 	struct fib * fib;
 	int consumed = 0;
-	unsigned long flags;
+	unsigned long flags, mflags;
 
-	spin_lock_irqsave(q->lock, flags);	
+	spin_lock_irqsave(q->lock, flags);
 	/*
 	 *	Keep pulling response QEs off the response queue and waking
 	 *	up the waiters until there are no more QEs. We then return
@@ -125,12 +125,21 @@ unsigned int aac_response_normal(struct 
 		} else {
 			unsigned long flagv;
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
 			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
 				aac_fib_complete(fib);
 				aac_fib_free(fib);
 			}
@@ -232,6 +241,7 @@ unsigned int aac_command_normal(struct a
 
 unsigned int aac_intr_normal(struct aac_dev * dev, u32 index)
 {
+	unsigned long mflags;
 	dprintk((KERN_INFO "aac_intr_normal(%p,%x)\n", dev, index));
 	if ((index & 0x00000002L)) {
 		struct hw_fib * hw_fib;
@@ -320,11 +330,25 @@ unsigned int aac_intr_normal(struct aac_
 			unsigned long flagv;
 	  		dprintk((KERN_INFO "event_wait up\n"));
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
+			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
+				aac_fib_complete(fib);
+				aac_fib_free(fib);
+			}
+
 		}
 		return 0;
 	}

[-- Attachment #3: community_aac24701.email --]
[-- Type: application/octet-stream, Size: 693 bytes --]


This attached patch is against current scsi-misc-2.6.

ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments.

Signed-off-by: Penchala Narasimha Reddy <ServeRAIDDriver@hcl.in>

 drivers/scsi/aacraid/aachba.c   |   52 +++++++++++++++++++++++++++--------
 drivers/scsi/aacraid/aacraid.h  |    5 ++-
 drivers/scsi/aacraid/commctrl.c |   37 +++++++++++++++----------
 drivers/scsi/aacraid/comminit.c |    6 +++-
 drivers/scsi/aacraid/commsup.c  |   59 +++++++++++++++++++++++++++++++++-------
 drivers/scsi/aacraid/dpcsup.c   |   36 ++++++++++++++++++++----
 6 files changed, 151 insertions(+), 44 deletions(-)

Sincerely - Penchala Narasimha Reddy

^ permalink raw reply	[flat|nested] 40+ messages in thread
* RE:  [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
@ 2009-12-24  6:29 Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  0 siblings, 0 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, ERS-HCLTech @ 2009-12-24  6:29 UTC (permalink / raw)
  To: James Bottomley, 'linux-scsi; +Cc: ServeRAID Driver

Hi James and linux-scsi community,

Please let us your comments on the patch (community_aac24702.patch) we submitted on December 21, 2009.

Thanks,
Narasimha Reddy

-----Original Message-----
From: Penchala Narasimha Reddy Chilakala, ERS-HCLTech 
Sent: Monday, December 21, 2009 6:39 PM
To: 'linux-scsi@vger.kernel.org'; 'James Bottomley'
Cc: ServeRAID Driver
Subject: [PATCH ] scsi-misc-2.6: File System going into read-only mode 

Hi James and linux-scsi community,

       I would request you that please review the attached updated patch, which contains the fix for racy issue (which was raised by James) and do the needful to incorporate the patch in up-coming release. 

Note: The patch was generated against the "kernel 2.6.33-rc1"

From: Narasimha Reddy <ServeRAIDDriver@hcl.in>

	The attached patch was generated for fixing the following issues. These issues were reported by Cisco, SAP and some other customers as well.

Regarding the testing of fixes:
--------------------------------

	These particular problems were reported by Cisco and SAP and customers as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build to IBM and Cisco. Cisco and IBM tested it for more than 15 days and they reported that they did not see the issue so far. Before the fix, Cisco used to see the issue within 5 days. We generated a patch for SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the patch and gave a test build to SAP. SAP tested and reported that the build is working properly.

We also tested in our lab using the tools "dishogsync", which is IO stress tool and the tool was provided by Cisco.

Issue1: File System going into read-only mode
---------

Root cause:
-----------
       The driver tends to not free the memory (FIB) when the management request exits prematurely. The accumulation of such un-freed memory causes the driver to fail to allocate anymore memory (FIB) and hence return 0x70000 value to the upper layer, which puts the file system into read only mode.

Fix details:
------------
     The fix makes sure to free the memory (FIB) even if the request exits prematurely hence ensuring the driver wouldn't run out of memory (FIBs).


Issue2:
------- 
	False Raid Alert occurs- when the Physical Drives and Logical drives are reported as deleted or added, even though there is no change done on the system

Root cause:
-----------
        Driver IOCTLs is signaled with EINTR while waiting on response from the lower layers. Returning "EINTR" will never initiate internal retry. 

Fix details:
------------
        The issue was fixed by replacing "EINTR" with "ERESTARTSYS" for mid-layer retries.


Signed-off-by: Narasimha Reddy <ServeRAIDDriver@hcl.in>

       Please find the attached patch file "community_aac24701.patch", which has been generated for the issues we fixed and "email" file "community_aac24701.email", which has been generated for the statistics like number of lines inserted and deleted, errors etc.
       
Regards - Narasimha Reddy


DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread
* [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
@ 2009-12-21 13:09 Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  0 siblings, 0 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, ERS-HCLTech @ 2009-12-21 13:09 UTC (permalink / raw)
  To: 'linux-scsi, James Bottomley; +Cc: ServeRAID Driver

[-- Attachment #1: Type: text/plain, Size: 3692 bytes --]

Hi James and linux-scsi community,

       I would request you that please review the attached updated patch, which contains the fix for racy issue (which was raised by James) and do the needful to incorporate the patch in up-coming release. 

Note: The patch was generated against the "kernel 2.6.33-rc1"

From: Narasimha Reddy <ServeRAIDDriver@hcl.in>

	The attached patch was generated for fixing the following issues. These issues were reported by Cisco, SAP and some other customers as well.

Regarding the testing of fixes:
--------------------------------

	These particular problems were reported by Cisco and SAP and customers as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build to IBM and Cisco. Cisco and IBM tested it for more than 15 days and they reported that they did not see the issue so far. Before the fix, Cisco used to see the issue within 5 days. We generated a patch for SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the patch and gave a test build to SAP. SAP tested and reported that the build is working properly.

We also tested in our lab using the tools "dishogsync", which is IO stress tool and the tool was provided by Cisco.

Issue1: File System going into read-only mode
---------

Root cause:
-----------
       The driver tends to not free the memory (FIB) when the management request exits prematurely. The accumulation of such un-freed memory causes the driver to fail to allocate anymore memory (FIB) and hence return 0x70000 value to the upper layer, which puts the file system into read only mode.

Fix details:
------------
     The fix makes sure to free the memory (FIB) even if the request exits prematurely hence ensuring the driver wouldn't run out of memory (FIBs).


Issue2:
------- 
	False Raid Alert occurs- when the Physical Drives and Logical drives are reported as deleted or added, even though there is no change done on the system

Root cause:
-----------
        Driver IOCTLs is signaled with EINTR while waiting on response from the lower layers. Returning "EINTR" will never initiate internal retry. 

Fix details:
------------
        The issue was fixed by replacing "EINTR" with "ERESTARTSYS" for mid-layer retries.


Signed-off-by: Narasimha Reddy <ServeRAIDDriver@hcl.in>

       Please find the attached patch file "community_aac24701.patch", which has been generated for the issues we fixed and "email" file "community_aac24701.email", which has been generated for the statistics like number of lines inserted and deleted, errors etc.
       
Regards - Narasimha Reddy


DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

[-- Attachment #2: community_aac24702.email --]
[-- Type: application/octet-stream, Size: 670 bytes --]


This attached patch is against current scsi-misc-2.6.

ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments.

Signed-off-by: Penchala Narasimha Reddy <ServeRAIDDriver@hcl.in>

 drivers/scsi/aacraid/aachba.c   |   52 ++++++++++++++++++++++------
 drivers/scsi/aacraid/aacraid.h  |    5 ++
 drivers/scsi/aacraid/commctrl.c |   28 +++++++--------
 drivers/scsi/aacraid/comminit.c |    6 ++-
 drivers/scsi/aacraid/commsup.c  |   72 +++++++++++++++++++++++++++++++++-------
 drivers/scsi/aacraid/dpcsup.c   |   36 ++++++++++++++++----
 6 files changed, 154 insertions(+), 45 deletions(-)

Sincerely - Penchala Narasimha Reddy

[-- Attachment #3: community_aac24702.patch --]
[-- Type: application/octet-stream, Size: 14750 bytes --]

diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
--- a/drivers/scsi/aacraid/aachba.c	2009-12-18 06:44:40.000000000 +0530
+++ b/drivers/scsi/aacraid/aachba.c	2009-12-21 12:57:48.000000000 +0530
@@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
 			status = -EINVAL;
 		}
 	}
-	aac_fib_complete(fibptr);
+	/* Do not set XferState to zero unless receives a response from F/W */
+	if (status >= 0)
+		aac_fib_complete(fibptr);
+
 	/* Send a CT_COMMIT_CONFIG to enable discovery of devices */
 	if (status >= 0) {
 		if ((aac_commit == 1) || commit_flag) {
@@ -310,13 +313,18 @@ int aac_get_config_status(struct aac_dev
 				    FsaNormal,
 				    1, 1,
 				    NULL, NULL);
-			aac_fib_complete(fibptr);
+			/* Do not set XferState to zero unless
+			 * receives a response from F/W */
+			if (status >= 0)
+				aac_fib_complete(fibptr);
 		} else if (aac_commit == 0) {
 			printk(KERN_WARNING
 			  "aac_get_config_status: Foreign device configurations are being ignored\n");
 		}
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 	return status;
 }
 
@@ -355,7 +363,9 @@ int aac_get_containers(struct aac_dev *d
 		maximum_num_containers = le32_to_cpu(dresp->ContainerSwitchEntries);
 		aac_fib_complete(fibptr);
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 
 	if (maximum_num_containers < MAXIMUM_NUM_CONTAINERS)
 		maximum_num_containers = MAXIMUM_NUM_CONTAINERS;
@@ -1245,8 +1255,12 @@ int aac_get_adapter_info(struct aac_dev*
 			 NULL);
 
 	if (rcode < 0) {
-		aac_fib_complete(fibptr);
-		aac_fib_free(fibptr);
+		/* FIB should be freed only after
+		 * getting the response from the F/W */
+		if (rcode != -ERESTARTSYS) {
+			aac_fib_complete(fibptr);
+			aac_fib_free(fibptr);
+		}
 		return rcode;
 	}
 	memcpy(&dev->adapter_info, info, sizeof(*info));
@@ -1270,6 +1284,12 @@ int aac_get_adapter_info(struct aac_dev*
 
 		if (rcode >= 0)
 			memcpy(&dev->supplement_adapter_info, sinfo, sizeof(*sinfo));
+		if (rcode == -ERESTARTSYS) {
+			fibptr = aac_fib_alloc(dev);
+			if (!fibptr)
+				return -ENOMEM;
+		}
+
 	}
 
 
@@ -1470,9 +1490,11 @@ int aac_get_adapter_info(struct aac_dev*
 			  (dev->scsi_host_ptr->sg_tablesize * 8) + 112;
 		}
 	}
-
-	aac_fib_complete(fibptr);
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (rcode != -ERESTARTSYS) {
+		aac_fib_complete(fibptr);
+		aac_fib_free(fibptr);
+	}
 
 	return rcode;
 }
@@ -1633,6 +1655,7 @@ static int aac_read(struct scsi_cmnd * s
 	 *	Alocate and initialize a Fib
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
+		printk(KERN_WARNING "aac_read: fib allocation failed\n");
 		return -1;
 	}
 
@@ -1712,9 +1735,14 @@ static int aac_write(struct scsi_cmnd * 
 	 *	Allocate and initialize a Fib then setup a BlockWrite command
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
-		scsicmd->result = DID_ERROR << 16;
-		scsicmd->scsi_done(scsicmd);
-		return 0;
+		/* FIB temporarily unavailable,not catastrophic failure */
+
+		/* scsicmd->result = DID_ERROR << 16;
+		 * scsicmd->scsi_done(scsicmd);
+		 * return 0;
+		 */
+		printk(KERN_WARNING "aac_write: fib allocation failed\n");
+		return -1;
 	}
 
 	status = aac_adapter_write(cmd_fibcontext, scsicmd, lba, count, fua);
diff -purN a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
--- a/drivers/scsi/aacraid/aacraid.h	2009-12-18 06:44:40.000000000 +0530
+++ b/drivers/scsi/aacraid/aacraid.h	2009-12-21 12:58:44.000000000 +0530
@@ -12,7 +12,7 @@
  *----------------------------------------------------------------------------*/
 
 #ifndef AAC_DRIVER_BUILD
-# define AAC_DRIVER_BUILD 2461
+# define AAC_DRIVER_BUILD 24702
 # define AAC_DRIVER_BRANCH "-ms"
 #endif
 #define MAXIMUM_NUM_CONTAINERS	32
@@ -1036,6 +1036,9 @@ struct aac_dev
 	u8			printf_enabled;
 	u8			in_reset;
 	u8			msi;
+	int			management_fib_count;
+	spinlock_t		manage_lock;
+
 };
 
 #define aac_adapter_interrupt(dev) \
diff -purN a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
--- a/drivers/scsi/aacraid/commctrl.c	2009-12-18 06:44:40.000000000 +0530
+++ b/drivers/scsi/aacraid/commctrl.c	2009-12-21 12:59:28.000000000 +0530
@@ -153,7 +153,7 @@ cleanup:
 		fibptr->hw_fib_pa = hw_fib_pa;
 		fibptr->hw_fib_va = hw_fib;
 	}
-	if (retval != -EINTR)
+	if (retval != -ERESTARTSYS)
 		aac_fib_free(fibptr);
 	return retval;
 }
@@ -322,7 +322,7 @@ return_fib:
 		}
 		if (f.wait) {
 			if(down_interruptible(&fibctx->wait_sem) < 0) {
-				status = -EINTR;
+				status = -ERESTARTSYS;
 			} else {
 				/* Lock again and retry */
 				spin_lock_irqsave(&dev->fib_lock, flags);
@@ -593,10 +593,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -645,10 +645,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -695,10 +695,10 @@ static int aac_send_raw_srb(struct aac_d
 				uintptr_t addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -734,10 +734,10 @@ static int aac_send_raw_srb(struct aac_d
 				dma_addr_t addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -772,8 +772,8 @@ static int aac_send_raw_srb(struct aac_d
 		psg->count = cpu_to_le32(sg_indx+1);
 		status = aac_fib_send(ScsiPortCommand, srbfib, actual_fibsize, FsaNormal, 1, 1, NULL, NULL);
 	}
-	if (status == -EINTR) {
-		rcode = -EINTR;
+	if (status == -ERESTARTSYS) {
+		rcode = -ERESTARTSYS;
 		goto cleanup;
 	}
 
@@ -810,7 +810,7 @@ cleanup:
 	for(i=0; i <= sg_indx; i++){
 		kfree(sg_list[i]);
 	}
-	if (rcode != -EINTR) {
+	if (rcode != -ERESTARTSYS) {
 		aac_fib_complete(srbfib);
 		aac_fib_free(srbfib);
 	}
@@ -848,7 +848,7 @@ int aac_do_ioctl(struct aac_dev * dev, i
 	 */
 
 	status = aac_dev_ioctl(dev, cmd, arg);
-	if(status != -ENOTTY)
+	if (status != -ENOTTY)
 		return status;
 
 	switch (cmd) {
diff -purN a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
--- a/drivers/scsi/aacraid/comminit.c	2009-12-18 06:44:40.000000000 +0530
+++ b/drivers/scsi/aacraid/comminit.c	2009-12-21 12:59:58.000000000 +0530
@@ -194,7 +194,9 @@ int aac_send_shutdown(struct aac_dev * d
 
 	if (status >= 0)
 		aac_fib_complete(fibctx);
-	aac_fib_free(fibctx);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibctx);
 	return status;
 }
 
@@ -304,6 +306,8 @@ struct aac_dev *aac_init_adapter(struct 
 	/*
 	 *	Check the preferred comm settings, defaults from template.
 	 */
+	dev->management_fib_count = 0;
+	spin_lock_init(&dev->manage_lock);
 	dev->max_fib_size = sizeof(struct hw_fib);
 	dev->sg_tablesize = host->sg_tablesize = (dev->max_fib_size
 		- sizeof(struct aac_fibhdr)
diff -purN a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
--- a/drivers/scsi/aacraid/commsup.c	2009-12-18 06:44:40.000000000 +0530
+++ b/drivers/scsi/aacraid/commsup.c	2009-12-21 18:13:59.000000000 +0530
@@ -189,7 +189,14 @@ struct fib *aac_fib_alloc(struct aac_dev
 
 void aac_fib_free(struct fib *fibptr)
 {
-	unsigned long flags;
+	unsigned long flags, flagsv;
+
+	spin_lock_irqsave(&fibptr->event_lock, flagsv);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
+		return;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
 
 	spin_lock_irqsave(&fibptr->dev->fib_lock, flags);
 	if (unlikely(fibptr->flags & FIB_CONTEXT_FLAG_TIMED_OUT))
@@ -390,6 +397,8 @@ int aac_fib_send(u16 command, struct fib
 	struct hw_fib * hw_fib = fibptr->hw_fib_va;
 	unsigned long flags = 0;
 	unsigned long qflags;
+	unsigned long mflags = 0;
+
 
 	if (!(hw_fib->header.XferState & cpu_to_le32(HostOwned)))
 		return -EBUSY;
@@ -471,9 +480,31 @@ int aac_fib_send(u16 command, struct fib
 	if (!dev->queues)
 		return -EBUSY;
 
-	if(wait)
+	if (wait) {
+
+		spin_lock_irqsave(&dev->manage_lock, mflags);
+		if (dev->management_fib_count >= AAC_NUM_MGT_FIB) {
+			printk(KERN_INFO "No management Fibs Available:%d\n",
+						dev->management_fib_count);
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+			return -EBUSY;
+		}
+		dev->management_fib_count++;
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-	aac_adapter_deliver(fibptr);
+	}
+
+	if (aac_adapter_deliver(fibptr) != 0) {
+		printk(KERN_ERR "aac_fib_send: returned -EBUSY\n");
+		if (wait) {
+			spin_unlock_irqrestore(&fibptr->event_lock, flags);
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+		}
+		return -EBUSY;
+	}
+
 
 	/*
 	 *	If the caller wanted us to wait for response wait now.
@@ -516,14 +547,15 @@ int aac_fib_send(u16 command, struct fib
 				udelay(5);
 			}
 		} else if (down_interruptible(&fibptr->event_wait)) {
-			fibptr->done = 2;
-			up(&fibptr->event_wait);
+			/* Do nothing ... satisfy
+			 * down_interruptible must_check */
 		}
+
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-		if ((fibptr->done == 0) || (fibptr->done == 2)) {
+		if (fibptr->done == 0) {
 			fibptr->done = 2; /* Tell interrupt we aborted */
 			spin_unlock_irqrestore(&fibptr->event_lock, flags);
-			return -EINTR;
+			return -ERESTARTSYS;
 		}
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
 		BUG_ON(fibptr->done == 0);
@@ -689,6 +721,7 @@ int aac_fib_adapter_complete(struct fib 
 
 int aac_fib_complete(struct fib *fibptr)
 {
+	unsigned long flags;
 	struct hw_fib * hw_fib = fibptr->hw_fib_va;
 
 	/*
@@ -709,6 +742,13 @@ int aac_fib_complete(struct fib *fibptr)
 	 *	command is complete that we had sent to the adapter and this
 	 *	cdb could be reused.
 	 */
+	spin_lock_irqsave(&fibptr->event_lock, flags);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return 0;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
 	if((hw_fib->header.XferState & cpu_to_le32(SentFromHost)) &&
 		(hw_fib->header.XferState & cpu_to_le32(AdapterProcessed)))
 	{
@@ -1355,7 +1395,10 @@ int aac_reset_adapter(struct aac_dev * a
 
 			if (status >= 0)
 				aac_fib_complete(fibctx);
-			aac_fib_free(fibctx);
+			/* FIB should be freed only after getting
+			 * the response from the F/W */
+			if (status != -ERESTARTSYS)
+				aac_fib_free(fibctx);
 		}
 	}
 
@@ -1759,6 +1802,7 @@ int aac_command_thread(void *data)
 				struct fib *fibptr;
 
 				if ((fibptr = aac_fib_alloc(dev))) {
+					int status;
 					__le32 *info;
 
 					aac_fib_init(fibptr);
@@ -1769,15 +1813,21 @@ int aac_command_thread(void *data)
 
 					*info = cpu_to_le32(now.tv_sec);
 
-					(void)aac_fib_send(SendHostTime,
+					status = aac_fib_send(SendHostTime,
 						fibptr,
 						sizeof(*info),
 						FsaNormal,
 						1, 1,
 						NULL,
 						NULL);
-					aac_fib_complete(fibptr);
-					aac_fib_free(fibptr);
+					/* Do not set XferState to zero unless
+					 * receives a response from F/W */
+					if (status >= 0)
+						aac_fib_complete(fibptr);
+					/* FIB should be freed only after
+					 * getting the response from the F/W */
+					if (status != -ERESTARTSYS)
+						aac_fib_free(fibptr);
 				}
 				difference = (long)(unsigned)update_interval*HZ;
 			} else {
diff -purN a/drivers/scsi/aacraid/dpcsup.c b/drivers/scsi/aacraid/dpcsup.c
--- a/drivers/scsi/aacraid/dpcsup.c	2009-12-18 06:44:40.000000000 +0530
+++ b/drivers/scsi/aacraid/dpcsup.c	2009-12-21 13:01:21.000000000 +0530
@@ -57,9 +57,9 @@ unsigned int aac_response_normal(struct 
 	struct hw_fib * hwfib;
 	struct fib * fib;
 	int consumed = 0;
-	unsigned long flags;
+	unsigned long flags, mflags;
 
-	spin_lock_irqsave(q->lock, flags);	
+	spin_lock_irqsave(q->lock, flags);
 	/*
 	 *	Keep pulling response QEs off the response queue and waking
 	 *	up the waiters until there are no more QEs. We then return
@@ -125,12 +125,21 @@ unsigned int aac_response_normal(struct 
 		} else {
 			unsigned long flagv;
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
 			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
 				aac_fib_complete(fib);
 				aac_fib_free(fib);
 			}
@@ -232,6 +241,7 @@ unsigned int aac_command_normal(struct a
 
 unsigned int aac_intr_normal(struct aac_dev * dev, u32 index)
 {
+	unsigned long mflags;
 	dprintk((KERN_INFO "aac_intr_normal(%p,%x)\n", dev, index));
 	if ((index & 0x00000002L)) {
 		struct hw_fib * hw_fib;
@@ -320,11 +330,25 @@ unsigned int aac_intr_normal(struct aac_
 			unsigned long flagv;
 	  		dprintk((KERN_INFO "event_wait up\n"));
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
+			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
+				aac_fib_complete(fib);
+				aac_fib_free(fib);
+			}
+
 		}
 		return 0;
 	}

^ permalink raw reply	[flat|nested] 40+ messages in thread
* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
@ 2009-12-18 12:25 Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  0 siblings, 0 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, ERS-HCLTech @ 2009-12-18 12:25 UTC (permalink / raw)
  To: James Bottomley
  Cc: 'linux-scsi@vger.kernel.org', James Bottomley, ServeRAID Driver

[-- Attachment #1: Type: text/plain, Size: 2073 bytes --]

Hi James,

Please find the attached patch, in which we have fixed the race condition issue and on which we started running our test cases as well, review the patch and let us know your final comments on the patch.

Thanks,
Narasimha Reddy


-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de]
Sent: Monday, November 16, 2009 9:21 PM
To: Penchala Narasimha Reddy Chilakala, ERS-HCLTech
Cc: 'linux-scsi@vger.kernel.org'; James Bottomley; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Mon, 2009-11-16 at 08:55 +0530, Penchala Narasimha Reddy Chilakala,
ERS-HCLTech wrote:
> Hi James,
>
> Thanks a lot for your nice review, feedback and support so far. I hope
> that we have sorted out all the issues raised by you except the racy
> issue.

Yes ... I think the race problem isn't particularly relevant to the
bugs, so send me a patch without the attempted ioctl fib counting and it
can go into -rc ... you can work on the fib counting later for
scsi-misc.

James



DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

[-- Attachment #2: community_aac24702.email --]
[-- Type: application/octet-stream, Size: 679 bytes --]


This attached patch is against current scsi-misc-2.6.

ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments.

Signed-off-by: Penchala Narasimha Reddy <ServeRAIDDriver@hcl.in>

 drivers/scsi/aacraid/aachba.c   |   52 ++++++++++++++++++++++++--------
 drivers/scsi/aacraid/aacraid.h  |    5 ++-
 drivers/scsi/aacraid/commctrl.c |   28 ++++++++---------
 drivers/scsi/aacraid/comminit.c |    6 +++
 drivers/scsi/aacraid/commsup.c  |   65 +++++++++++++++++++++++++++++++++-------
 drivers/scsi/aacraid/dpcsup.c   |   36 ++++++++++++++++++----
 6 files changed, 148 insertions(+), 44 deletions(-)

Sincerely - Penchala Narasimha Reddy

[-- Attachment #3: community_aac24702.patch --]
[-- Type: application/octet-stream, Size: 14468 bytes --]

diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
--- a/drivers/scsi/aacraid/aachba.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/aachba.c	2009-12-18 17:09:42.000000000 +0530
@@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
 			status = -EINVAL;
 		}
 	}
-	aac_fib_complete(fibptr);
+	/* Do not set XferState to zero unless receives a response from F/W */
+	if (status >= 0)
+		aac_fib_complete(fibptr);
+
 	/* Send a CT_COMMIT_CONFIG to enable discovery of devices */
 	if (status >= 0) {
 		if ((aac_commit == 1) || commit_flag) {
@@ -310,13 +313,18 @@ int aac_get_config_status(struct aac_dev
 				    FsaNormal,
 				    1, 1,
 				    NULL, NULL);
-			aac_fib_complete(fibptr);
+			/* Do not set XferState to zero unless
+			 * receives a response from F/W */
+			if (status >= 0)
+				aac_fib_complete(fibptr);
 		} else if (aac_commit == 0) {
 			printk(KERN_WARNING
 			  "aac_get_config_status: Foreign device configurations are being ignored\n");
 		}
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 	return status;
 }
 
@@ -355,7 +363,9 @@ int aac_get_containers(struct aac_dev *d
 		maximum_num_containers = le32_to_cpu(dresp->ContainerSwitchEntries);
 		aac_fib_complete(fibptr);
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 
 	if (maximum_num_containers < MAXIMUM_NUM_CONTAINERS)
 		maximum_num_containers = MAXIMUM_NUM_CONTAINERS;
@@ -1245,8 +1255,12 @@ int aac_get_adapter_info(struct aac_dev*
 			 NULL);
 
 	if (rcode < 0) {
-		aac_fib_complete(fibptr);
-		aac_fib_free(fibptr);
+		/* FIB should be freed only after
+		 * getting the response from the F/W */
+		if (rcode != -ERESTARTSYS) {
+			aac_fib_complete(fibptr);
+			aac_fib_free(fibptr);
+		}
 		return rcode;
 	}
 	memcpy(&dev->adapter_info, info, sizeof(*info));
@@ -1270,6 +1284,12 @@ int aac_get_adapter_info(struct aac_dev*
 
 		if (rcode >= 0)
 			memcpy(&dev->supplement_adapter_info, sinfo, sizeof(*sinfo));
+		if (rcode == -ERESTARTSYS) {
+			fibptr = aac_fib_alloc(dev);
+			if (!fibptr)
+				return -ENOMEM;
+		}
+
 	}
 
 
@@ -1470,9 +1490,11 @@ int aac_get_adapter_info(struct aac_dev*
 			  (dev->scsi_host_ptr->sg_tablesize * 8) + 112;
 		}
 	}
-
-	aac_fib_complete(fibptr);
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (rcode != -ERESTARTSYS) {
+		aac_fib_complete(fibptr);
+		aac_fib_free(fibptr);
+	}
 
 	return rcode;
 }
@@ -1633,6 +1655,7 @@ static int aac_read(struct scsi_cmnd * s
 	 *	Alocate and initialize a Fib
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
+		printk(KERN_WARNING "aac_read: fib allocation failed\n");
 		return -1;
 	}
 
@@ -1712,9 +1735,14 @@ static int aac_write(struct scsi_cmnd * 
 	 *	Allocate and initialize a Fib then setup a BlockWrite command
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
-		scsicmd->result = DID_ERROR << 16;
-		scsicmd->scsi_done(scsicmd);
-		return 0;
+		/* FIB temporarily unavailable,not catastrophic failure */
+
+		/* scsicmd->result = DID_ERROR << 16;
+		 * scsicmd->scsi_done(scsicmd);
+		 * return 0;
+		 */
+		printk(KERN_WARNING "aac_write: fib allocation failed\n");
+		return -1;
 	}
 
 	status = aac_adapter_write(cmd_fibcontext, scsicmd, lba, count, fua);
diff -purN a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
--- a/drivers/scsi/aacraid/aacraid.h	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/aacraid.h	2009-12-18 17:10:08.000000000 +0530
@@ -12,7 +12,7 @@
  *----------------------------------------------------------------------------*/
 
 #ifndef AAC_DRIVER_BUILD
-# define AAC_DRIVER_BUILD 2461
+# define AAC_DRIVER_BUILD 24702
 # define AAC_DRIVER_BRANCH "-ms"
 #endif
 #define MAXIMUM_NUM_CONTAINERS	32
@@ -1036,6 +1036,9 @@ struct aac_dev
 	u8			printf_enabled;
 	u8			in_reset;
 	u8			msi;
+	int			management_fib_count;
+	spinlock_t		manage_lock;
+
 };
 
 #define aac_adapter_interrupt(dev) \
diff -purN a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
--- a/drivers/scsi/aacraid/commctrl.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/commctrl.c	2009-12-18 17:14:37.000000000 +0530
@@ -153,7 +153,7 @@ cleanup:
 		fibptr->hw_fib_pa = hw_fib_pa;
 		fibptr->hw_fib_va = hw_fib;
 	}
-	if (retval != -EINTR)
+	if (retval != -ERESTARTSYS)
 		aac_fib_free(fibptr);
 	return retval;
 }
@@ -322,7 +322,7 @@ return_fib:
 		}
 		if (f.wait) {
 			if(down_interruptible(&fibctx->wait_sem) < 0) {
-				status = -EINTR;
+				status = -ERESTARTSYS;
 			} else {
 				/* Lock again and retry */
 				spin_lock_irqsave(&dev->fib_lock, flags);
@@ -593,10 +593,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -645,10 +645,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -695,10 +695,10 @@ static int aac_send_raw_srb(struct aac_d
 				uintptr_t addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -734,10 +734,10 @@ static int aac_send_raw_srb(struct aac_d
 				dma_addr_t addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -772,8 +772,8 @@ static int aac_send_raw_srb(struct aac_d
 		psg->count = cpu_to_le32(sg_indx+1);
 		status = aac_fib_send(ScsiPortCommand, srbfib, actual_fibsize, FsaNormal, 1, 1, NULL, NULL);
 	}
-	if (status == -EINTR) {
-		rcode = -EINTR;
+	if (status == -ERESTARTSYS) {
+		rcode = -ERESTARTSYS;
 		goto cleanup;
 	}
 
@@ -810,7 +810,7 @@ cleanup:
 	for(i=0; i <= sg_indx; i++){
 		kfree(sg_list[i]);
 	}
-	if (rcode != -EINTR) {
+	if (rcode != -ERESTARTSYS) {
 		aac_fib_complete(srbfib);
 		aac_fib_free(srbfib);
 	}
@@ -848,7 +848,7 @@ int aac_do_ioctl(struct aac_dev * dev, i
 	 */
 
 	status = aac_dev_ioctl(dev, cmd, arg);
-	if(status != -ENOTTY)
+	if (status != -ENOTTY)
 		return status;
 
 	switch (cmd) {
diff -purN a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
--- a/drivers/scsi/aacraid/comminit.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/comminit.c	2009-12-18 17:09:42.000000000 +0530
@@ -194,7 +194,9 @@ int aac_send_shutdown(struct aac_dev * d
 
 	if (status >= 0)
 		aac_fib_complete(fibctx);
-	aac_fib_free(fibctx);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibctx);
 	return status;
 }
 
@@ -304,6 +306,8 @@ struct aac_dev *aac_init_adapter(struct 
 	/*
 	 *	Check the preferred comm settings, defaults from template.
 	 */
+	dev->management_fib_count = 0;
+	spin_lock_init(&dev->manage_lock);
 	dev->max_fib_size = sizeof(struct hw_fib);
 	dev->sg_tablesize = host->sg_tablesize = (dev->max_fib_size
 		- sizeof(struct aac_fibhdr)
diff -purN a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
--- a/drivers/scsi/aacraid/commsup.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/commsup.c	2009-12-18 17:15:10.000000000 +0530
@@ -189,7 +189,14 @@ struct fib *aac_fib_alloc(struct aac_dev
 
 void aac_fib_free(struct fib *fibptr)
 {
-	unsigned long flags;
+	unsigned long flags, flagsv;
+
+	spin_lock_irqsave(&fibptr->event_lock, flagsv);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
+		return;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
 
 	spin_lock_irqsave(&fibptr->dev->fib_lock, flags);
 	if (unlikely(fibptr->flags & FIB_CONTEXT_FLAG_TIMED_OUT))
@@ -473,14 +480,33 @@ int aac_fib_send(u16 command, struct fib
 
 	if(wait)
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-	aac_adapter_deliver(fibptr);
+
+	if (aac_adapter_deliver(fibptr) != 0) {
+		printk(KERN_ERR "aac_fib_send: returned -EBUSY\n");
+		if (wait)
+			spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return -EBUSY;
+	}
+
 
 	/*
 	 *	If the caller wanted us to wait for response wait now.
 	 */
 
 	if (wait) {
+		unsigned long mflags;
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
+		spin_lock_irqsave(&dev->manage_lock, mflags);
+		if (dev->management_fib_count >= AAC_NUM_MGT_FIB) {
+			printk(KERN_INFO "No management Fibs Available:%d\n",
+						dev->management_fib_count);
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+			return -EBUSY;
+		}
+		dev->management_fib_count++;
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 		/* Only set for first known interruptable command */
 		if (wait < 0) {
 			/*
@@ -516,14 +542,15 @@ int aac_fib_send(u16 command, struct fib
 				udelay(5);
 			}
 		} else if (down_interruptible(&fibptr->event_wait)) {
-			fibptr->done = 2;
-			up(&fibptr->event_wait);
+			/* Do nothing ... satisfy
+			 * down_interruptible must_check */
 		}
+
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-		if ((fibptr->done == 0) || (fibptr->done == 2)) {
+		if (fibptr->done == 0) {
 			fibptr->done = 2; /* Tell interrupt we aborted */
 			spin_unlock_irqrestore(&fibptr->event_lock, flags);
-			return -EINTR;
+			return -ERESTARTSYS;
 		}
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
 		BUG_ON(fibptr->done == 0);
@@ -689,6 +716,7 @@ int aac_fib_adapter_complete(struct fib 
 
 int aac_fib_complete(struct fib *fibptr)
 {
+	unsigned long flags;
 	struct hw_fib * hw_fib = fibptr->hw_fib_va;
 
 	/*
@@ -709,6 +737,13 @@ int aac_fib_complete(struct fib *fibptr)
 	 *	command is complete that we had sent to the adapter and this
 	 *	cdb could be reused.
 	 */
+	spin_lock_irqsave(&fibptr->event_lock, flags);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return 0;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
 	if((hw_fib->header.XferState & cpu_to_le32(SentFromHost)) &&
 		(hw_fib->header.XferState & cpu_to_le32(AdapterProcessed)))
 	{
@@ -1355,7 +1390,10 @@ int aac_reset_adapter(struct aac_dev * a
 
 			if (status >= 0)
 				aac_fib_complete(fibctx);
-			aac_fib_free(fibctx);
+			/* FIB should be freed only after getting
+			 * the response from the F/W */
+			if (status != -ERESTARTSYS)
+				aac_fib_free(fibctx);
 		}
 	}
 
@@ -1759,6 +1797,7 @@ int aac_command_thread(void *data)
 				struct fib *fibptr;
 
 				if ((fibptr = aac_fib_alloc(dev))) {
+					int status;
 					__le32 *info;
 
 					aac_fib_init(fibptr);
@@ -1769,15 +1808,21 @@ int aac_command_thread(void *data)
 
 					*info = cpu_to_le32(now.tv_sec);
 
-					(void)aac_fib_send(SendHostTime,
+					status = aac_fib_send(SendHostTime,
 						fibptr,
 						sizeof(*info),
 						FsaNormal,
 						1, 1,
 						NULL,
 						NULL);
-					aac_fib_complete(fibptr);
-					aac_fib_free(fibptr);
+					/* Do not set XferState to zero unless
+					 * receives a response from F/W */
+					if (status >= 0)
+						aac_fib_complete(fibptr);
+					/* FIB should be freed only after
+					 * getting the response from the F/W */
+					if (status != -ERESTARTSYS)
+						aac_fib_free(fibptr);
 				}
 				difference = (long)(unsigned)update_interval*HZ;
 			} else {
diff -purN a/drivers/scsi/aacraid/dpcsup.c b/drivers/scsi/aacraid/dpcsup.c
--- a/drivers/scsi/aacraid/dpcsup.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/dpcsup.c	2009-12-18 17:09:42.000000000 +0530
@@ -57,9 +57,9 @@ unsigned int aac_response_normal(struct 
 	struct hw_fib * hwfib;
 	struct fib * fib;
 	int consumed = 0;
-	unsigned long flags;
+	unsigned long flags, mflags;
 
-	spin_lock_irqsave(q->lock, flags);	
+	spin_lock_irqsave(q->lock, flags);
 	/*
 	 *	Keep pulling response QEs off the response queue and waking
 	 *	up the waiters until there are no more QEs. We then return
@@ -125,12 +125,21 @@ unsigned int aac_response_normal(struct 
 		} else {
 			unsigned long flagv;
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
 			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
 				aac_fib_complete(fib);
 				aac_fib_free(fib);
 			}
@@ -232,6 +241,7 @@ unsigned int aac_command_normal(struct a
 
 unsigned int aac_intr_normal(struct aac_dev * dev, u32 index)
 {
+	unsigned long mflags;
 	dprintk((KERN_INFO "aac_intr_normal(%p,%x)\n", dev, index));
 	if ((index & 0x00000002L)) {
 		struct hw_fib * hw_fib;
@@ -320,11 +330,25 @@ unsigned int aac_intr_normal(struct aac_
 			unsigned long flagv;
 	  		dprintk((KERN_INFO "event_wait up\n"));
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
+			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
+				aac_fib_complete(fib);
+				aac_fib_free(fib);
+			}
+
 		}
 		return 0;
 	}

^ permalink raw reply	[flat|nested] 40+ messages in thread
* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
@ 2009-11-02  7:35 Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-11-02 16:30 ` Stefan Richter
  2009-11-02 17:45 ` James Bottomley
  0 siblings, 2 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-11-02  7:35 UTC (permalink / raw)
  To: James Bottomley; +Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

Hi James,


Please let me know your opinion on the explanation given by me in the previous e-mail to the issue raised by you so that I will proceed further based on your feedback as lot of IBM customers and RedHat have been waiting for this patch to come in linux-scsi upstream even though we have given private build to some of the IBM customers like Cisco and SAP?

Thanks,
Narasimha Reddy

-----Original Message-----
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Sent: Friday, October 23, 2009 6:41 PM
To: 'James Bottomley'
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

Hi James,

Thanks a lot for your feedback.

Here is my explanation for the change. We have done good amount of review and testing before finalizing the change. We found the following code also caused to exit the management request prematurely without getting response from the firmware during our review, which in turn caused to generated False Raid Alert events in the application layer.

So after doing good amount of review and testing, we replace the below code

            else if (down_interruptible(&fibptr->event_wait)) {
                      fibptr->done = 2;
                      up(&fibptr->event_wait);
          }

With the following code
            } else
                down_interruptible (&fibptr->event_wait);

To overcome the warning, we can replace the above code with either

            } else if (down_interruptible (&fibptr->event_wait));

                        Or

            } else {
                 if (down_interruptible (&fibptr->event_wait))
                        ;
            }

I am ok to do the above mentioned change in the code and resubmit the patch with the above change as there is no issue with respect to functionality and other aspects as such.

Please let me know your opinion on this so that I will proceed further based on your feedback.


Important note:
-----------------

fibptr->event_wait is initialized with the following OS kernel function call during the driver initialization.

        init_MUTEX_LOCKED(&fibptr->event_wait);

The above OS kernel function call initializes the semaphore to locked (unavailable) state. On using the fibptr->event_wait by down_interruptible, the semaphore is not available so process/task will enter into sleep mode until the semaphore is available. The semaphore will be available only when the response gets from the firmware for the request delivered to the firmware. When response comes from the firmware then driver's response path code calls the up() if the fibpter->done is equal to zero. So the corresponding task/process gets semaphore and wakes the task/process up and proceeds with further processing.

Suppose the process/task, who is waiting for semaphore, gets exited before getting response from the firmware due to some signal, then driver should not call up() function as there is no point calling up() function after exiting as the fibptr->event_wait was initialized to locked (unavailable) state during the driver initialization.

So my point, here, is that we should not call up() if process/task is exited prematurely before getting the response from the firmware for the request delivered to firmware as the fibptr->event_wait was initialized to locked (unavailable) state.
---------------------------------------------------------------------------

Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de]
Sent: Thursday, October 22, 2009 6:40 AM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Fri, 2009-10-09 at 14:53 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Hi James and linux-scsi community,
>
> Can you please let us know your feedback on my response to the queries
> raised by James so that I will proceed further based on your feedback?

Sorry, press of conferences has delayed me getting around to looking at
this.

The patch is spitting this compile warning:

drivers/scsi/aacraid/commsup.c: In function 'aac_fib_send':
drivers/scsi/aacraid/commsup.c:539: warning: ignoring return value of 'down_interruptible', declared with attribute warn_unused_result

Because of this hunk:

@@ -515,15 +535,14 @@ int aac_fib_send(u16 command, struct fib *fibptr, unsigned
                                }
                                udelay(5);
                        }
-               } else if (down_interruptible(&fibptr->event_wait)) {
-                       fibptr->done = 2;
-                       up(&fibptr->event_wait);
-               }
+               } else
+                       down_interruptible(&fibptr->event_wait);


The reason for the warning is that down_interruptible() will return an
error if it gets interrupted without acquiring the semaphore, and the
kernel checkers believe you can't write correct code without knowing
whether you acquired the semaphore or not.

Now, the original use case didn't care it was just waiting for the
semaphore to become available but didn't actually want to acquire it, so
I suspect what you want is:

} else if (down_interruptible(&fibptr->event_wait))
                     up(&fibptr->event_wait);

If you suddenly do care about acquiring the semaphore, you need to do
something about the interrupted failure case.

James




DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 40+ messages in thread
* RE: [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
@ 2009-10-09  9:23 Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-10-22  1:09 ` James Bottomley
  0 siblings, 1 reply; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-10-09  9:23 UTC (permalink / raw)
  To: James Bottomley; +Cc: 'linux-scsi@vger.kernel.org', ServeRAID Driver

Hi James and linux-scsi community,

Can you please let us know your feedback on my response to the queries raised by James so that I will proceed further based on your feedback?

Thanks,
Narasimha Reddy 

-----Original Message-----
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai 
Sent: Wednesday, October 07, 2009 11:33 AM
To: James Bottomley
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode


Hi James and linux-scsi community,

Have you had a chance to look at my inline responses to the queries raised by James?

Thanks,
Narasimha Reddy

-----Original Message-----
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai 
Sent: Wednesday, September 30, 2009 5:04 PM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai; James Bottomley
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

Hi James and linux-scsi community,

Sorry for missing a word "not" in the previous mail for the following inline response.

flags is the usual linux convention for this ... and that's what's used
throughout the rest of the file ... consistency on flags would be better
(for all the uses).

<Previous response> I do have any issue to replace "mflags" with "flags", but we have used this mflags in dpcsup.c, and commsup.c as well. We initially thought of using flags only, but flags variable is used by some other spinlock function in the same function in dpcsup.c so we do not have any other alternative and went with different name. So we used mflags for the management fib count across all the files.

<Correct response>
I do not have any issue to replace "mflags" with "flags", but we have used this mflags in dpcsup.c, and commsup.c as well. We initially thought of using flags only, but flags variable is used by some other spinlock function in the same function in dpcsup.c so we do not have any other alternative and went with different name. So we used mflags for the management fib count across all the files.



Thanks,
Narasimha Reddy
 

-----Original Message-----
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai 
Sent: Wednesday, September 30, 2009 4:40 PM
To: James Bottomley
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: RE: [PATCH ] scsi-misc-2.6: File System going into read-only mode

Hi James and linux-scsi community,

Thank you very much for your valuable feedback.

Please have a look at my inline responses and let me your opinion so that I will go ahead and do the necessary changes if required.

Thanks,
Narasimha Reddy

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@suse.de] 
Sent: Tuesday, September 29, 2009 7:50 PM
To: Penchala Narasimha Reddy Chilakala, TLS-Chennai
Cc: 'linux-scsi@vger.kernel.org'; ServeRAID Driver
Subject: Re: [PATCH ] scsi-misc-2.6: File System going into read-only mode

On Tue, 2009-09-29 at 14:17 +0530, Penchala Narasimha Reddy Chilakala,
TLS-Chennai wrote:
> Hi James and linux-scsi community,
>
>        I would request you that please review the attached patch and do the needful to incorporate the patch in up-coming release. I am happy to provide if any additional info is required to accept the patch.
>
> From: Narasimha Reddy <ServeRAIDDriver@hcl.in>
>
>         The attached patch was generated for fixing the following issues. These issues were reported by Cisco, SAP and some other customers as well.
>
> Regarding the testing of fixes:
> --------------------------------
>
>         These particular problems were reported by Cisco and SAP and customers as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build to IBM and Cisco. Cisco and IBM tested it for more than 15 days and they reported that they did not see the issue so far. Before the fix, Cisco used to see the issue within 5 days. We generated a patch for SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the patch and gave a test build to SAP. SAP tested and reported that the build is working properly.
>
> We also tested in our lab using the tools "dishogsync", which is IO stress tool and the tool was provided by Cisco.
>
> Issue1: File System going into read-only mode
> ---------
>
> Root cause:
> -----------
>        The driver tends to not free the memory (FIB) when the management request exits prematurely. The accumulation of such un-freed memory causes the driver to fail to allocate anymore memory (FIB) and hence return 0x70000 value to the upper layer, which puts the file system into read only mode.
>
> Fix details:
> ------------
>      The fix makes sure to free the memory (FIB) even if the request exits prematurely hence ensuring the driver wouldn't run out of memory (FIBs).
>
>
> Issue2:
> -------
>         False Raid Alert occurs- when the Physical Drives and Logical drives are reported as deleted or added, even though there is no change done on the system
>
> Root cause:
> -----------
>         Driver IOCTLs is signaled with EINTR while waiting on response from the lower layers. Returning "EINTR" will never initiate internal retry.
>
> Fix details:
> ------------
>         The issue was fixed by replacing "EINTR" with "ERESTARTSYS" for mid-layer retries.
>
>
> Signed-off-by: Narasimha Reddy <ServeRAIDDriver@hcl.in>

Would it be possible to split this into two patches, one that fixes each
issue?  That makes it easier for distro backporting and bisection
checking if something goes wrong with the fix

<Narasimha Reddy> Actually, the issues are inter-linked each other even though customers reported the issues separately. So I feel that it would be better go with as single patch instead of separate patches.

>        Please find the attached patch file "community_aac24701.patch", which has been generated for the issues we fixed and "email" file "community_aac24701.email", which has been generated for the statistics like number of lines inserted and deleted, errors etc.

That's fine ... I note you're using outlook, which is known to mangle
inline patches.

> Regards - Narasimha Reddy
>
>
> DISCLAIMER:
> -----------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
> this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
> received this email in error please delete it and notify the sender immediately. Before opening any mail and
> attachments please check them for viruses and defect.
>
> -----------------------------------------------------------------------------------------------------------------------

It does look a bit odd to have a disclaimer like this on an email you've
just sent to a public list ...

<Narasimha Reddy>:   Yes, I agree with your comment, but the disclaimer will be added by HCL email-server for all outgoing mails outside HCL. I will talk to HCL IT service about this if it is serious issue. Please let me know.


> diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
> --- a/drivers/scsi/aacraid/aachba.c     2009-09-10 03:43:59.000000000 +0530
> +++ b/drivers/scsi/aacraid/aachba.c     2009-09-25 11:43:12.000000000 +0530
> @@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
>                         status = -EINVAL;
>                 }
>         }
> -       aac_fib_complete(fibptr);
> +       /* Do not set XferState to zero unless receives a response from F/W */
> +       if (status >= 0)
> +               aac_fib_complete(fibptr);

This can just move under the if (status >= 0) below, can't it?

<Narasimha Reddy>: I accept your suggestion as we can avoid two times execution of the same condition and redundancy. Earlier there was no check and comment before calling the aac_fib_complete function. We added a check to fix the issues and comment to understand that why we have done the change and it is easy for reviewing when we have comments like this. 

> +
>         /* Send a CT_COMMIT_CONFIG to enable discovery of devices */
>         if (status >= 0) {

[...]


> @@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
>  int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
>  {
>         int status;
> +       unsigned long mflags;

flags is the usual linux convention for this ... and that's what's used
throughout the rest of the file ... consistency on flags would be better
(for all the uses).

<Narasimha Reddy> I do have any issue to replace "mflags" with "flags", but we have used this mflags in dpcsup.c, and commsup.c as well. We initially thought of using flags only, but flags variable is used by some other spinlock function in the same function in dpcsup.c so we do not have any other alternative and went with different name. So we used mflags for the management fib count across all the files. 

>
>         /*
>          *      HBA gets first crack
>          */
>
> +       spin_lock_irqsave(&dev->manage_lock, mflags);
> +       if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
> +               printk(KERN_INFO "No management Fibs Available:%d\n",
> +                                               dev->management_fib_count);
> +               spin_unlock_irqrestore(&dev->manage_lock, mflags);
> +               return -EBUSY;
> +       }
> +       spin_unlock_irqrestore(&dev->manage_lock, mflags);
>         status = aac_dev_ioctl(dev, cmd, arg);
> -       if(status != -ENOTTY)
> +       if (status != -ENOTTY)
>                 return status;

It seems to me from the above that the intention of manage_lock is to
protect management_fib_count?  If so, there's a race here, isn't there,
because you check the count under the lock, but then drop the lock which
makes the count vulnerable to change again before you get to
aac_fib_send()?  If you really don't want to go over AAC_NUM_MGT_FIB,
You'll need to do an atomic test and increment somewhere.

<Narasimha Reddy>  We have tested it extensively this in our lab and client as well, we have not seen any race condition here as it is a starting point for any management commands, it is meant only for management commands, and not meant for IO commands. I strongly feel that there would not be any race condition.

For example:

1. Assume driver received a management comment from a application layer, 
2. On receiving, driver checks whether management fib count reached to its limit or not. 
3. If it reached to its limit, then it will return as -EBUSY to the application layer saying that no management fibs are available at this point of time. That means currently, all management fibs are in use. 

Other than the above, the code in here looks fine.

Thanks,

James





^ permalink raw reply	[flat|nested] 40+ messages in thread
* [PATCH ]  scsi-misc-2.6:  File System going into read-only mode
@ 2009-09-29  8:47 Penchala Narasimha Reddy Chilakala, TLS-Chennai
  2009-09-29 14:19 ` James Bottomley
  2009-11-02 17:46 ` James Bottomley
  0 siblings, 2 replies; 40+ messages in thread
From: Penchala Narasimha Reddy Chilakala, TLS-Chennai @ 2009-09-29  8:47 UTC (permalink / raw)
  To: 'linux-scsi, James Bottomley; +Cc: ServeRAID Driver

[-- Attachment #1: Type: text/plain, Size: 3641 bytes --]

Hi James and linux-scsi community,

       I would request you that please review the attached patch and do the needful to incorporate the patch in up-coming release. I am happy to provide if any additional info is required to accept the patch.

From: Narasimha Reddy <ServeRAIDDriver@hcl.in>

        The attached patch was generated for fixing the following issues. These issues were reported by Cisco, SAP and some other customers as well.

Regarding the testing of fixes:
--------------------------------

        These particular problems were reported by Cisco and SAP and customers as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build to IBM and Cisco. Cisco and IBM tested it for more than 15 days and they reported that they did not see the issue so far. Before the fix, Cisco used to see the issue within 5 days. We generated a patch for SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the patch and gave a test build to SAP. SAP tested and reported that the build is working properly.

We also tested in our lab using the tools "dishogsync", which is IO stress tool and the tool was provided by Cisco.

Issue1: File System going into read-only mode
---------

Root cause:
-----------
       The driver tends to not free the memory (FIB) when the management request exits prematurely. The accumulation of such un-freed memory causes the driver to fail to allocate anymore memory (FIB) and hence return 0x70000 value to the upper layer, which puts the file system into read only mode.

Fix details:
------------
     The fix makes sure to free the memory (FIB) even if the request exits prematurely hence ensuring the driver wouldn't run out of memory (FIBs).


Issue2:
-------
        False Raid Alert occurs- when the Physical Drives and Logical drives are reported as deleted or added, even though there is no change done on the system

Root cause:
-----------
        Driver IOCTLs is signaled with EINTR while waiting on response from the lower layers. Returning "EINTR" will never initiate internal retry.

Fix details:
------------
        The issue was fixed by replacing "EINTR" with "ERESTARTSYS" for mid-layer retries.


Signed-off-by: Narasimha Reddy <ServeRAIDDriver@hcl.in>

       Please find the attached patch file "community_aac24701.patch", which has been generated for the issues we fixed and "email" file "community_aac24701.email", which has been generated for the statistics like number of lines inserted and deleted, errors etc.

Regards - Narasimha Reddy


DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

[-- Attachment #2: community_aac24701.email --]
[-- Type: application/octet-stream, Size: 690 bytes --]


This attached patch is against current scsi-misc-2.6.

ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments.

Signed-off-by: Penchala Narasimha Reddy <ServeRAIDDriver@hcl.in>

 drivers/scsi/aacraid/aachba.c   |   52 ++++++++++++++++++++++++++--------
 drivers/scsi/aacraid/aacraid.h  |    5 ++-
 drivers/scsi/aacraid/commctrl.c |   37 +++++++++++++++---------
 drivers/scsi/aacraid/comminit.c |    6 +++
 drivers/scsi/aacraid/commsup.c  |   61 ++++++++++++++++++++++++++++++++--------
 drivers/scsi/aacraid/dpcsup.c   |   36 +++++++++++++++++++----
 6 files changed, 151 insertions(+), 46 deletions(-)

Sincerely - Penchala Narasimha Reddy 

[-- Attachment #3: community_aac24701.patch --]
[-- Type: application/octet-stream, Size: 14698 bytes --]

diff -purN a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
--- a/drivers/scsi/aacraid/aachba.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/aachba.c	2009-09-25 11:43:12.000000000 +0530
@@ -293,7 +293,10 @@ int aac_get_config_status(struct aac_dev
 			status = -EINVAL;
 		}
 	}
-	aac_fib_complete(fibptr);
+	/* Do not set XferState to zero unless receives a response from F/W */
+	if (status >= 0)
+		aac_fib_complete(fibptr);
+
 	/* Send a CT_COMMIT_CONFIG to enable discovery of devices */
 	if (status >= 0) {
 		if ((aac_commit == 1) || commit_flag) {
@@ -310,13 +313,18 @@ int aac_get_config_status(struct aac_dev
 				    FsaNormal,
 				    1, 1,
 				    NULL, NULL);
-			aac_fib_complete(fibptr);
+			/* Do not set XferState to zero unless
+			 * receives a response from F/W */
+			if (status >= 0)
+				aac_fib_complete(fibptr);
 		} else if (aac_commit == 0) {
 			printk(KERN_WARNING
 			  "aac_get_config_status: Foreign device configurations are being ignored\n");
 		}
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 	return status;
 }
 
@@ -355,7 +363,9 @@ int aac_get_containers(struct aac_dev *d
 		maximum_num_containers = le32_to_cpu(dresp->ContainerSwitchEntries);
 		aac_fib_complete(fibptr);
 	}
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibptr);
 
 	if (maximum_num_containers < MAXIMUM_NUM_CONTAINERS)
 		maximum_num_containers = MAXIMUM_NUM_CONTAINERS;
@@ -1245,8 +1255,12 @@ int aac_get_adapter_info(struct aac_dev*
 			 NULL);
 
 	if (rcode < 0) {
-		aac_fib_complete(fibptr);
-		aac_fib_free(fibptr);
+		/* FIB should be freed only after
+		 * getting the response from the F/W */
+		if (rcode != -ERESTARTSYS) {
+			aac_fib_complete(fibptr);
+			aac_fib_free(fibptr);
+		}
 		return rcode;
 	}
 	memcpy(&dev->adapter_info, info, sizeof(*info));
@@ -1270,6 +1284,12 @@ int aac_get_adapter_info(struct aac_dev*
 
 		if (rcode >= 0)
 			memcpy(&dev->supplement_adapter_info, sinfo, sizeof(*sinfo));
+		if (rcode == -ERESTARTSYS) {
+			fibptr = aac_fib_alloc(dev);
+			if (!fibptr)
+				return -ENOMEM;
+		}
+
 	}
 
 
@@ -1470,9 +1490,11 @@ int aac_get_adapter_info(struct aac_dev*
 			  (dev->scsi_host_ptr->sg_tablesize * 8) + 112;
 		}
 	}
-
-	aac_fib_complete(fibptr);
-	aac_fib_free(fibptr);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (rcode != -ERESTARTSYS) {
+		aac_fib_complete(fibptr);
+		aac_fib_free(fibptr);
+	}
 
 	return rcode;
 }
@@ -1633,6 +1655,7 @@ static int aac_read(struct scsi_cmnd * s
 	 *	Alocate and initialize a Fib
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
+		printk(KERN_WARNING "aac_read: fib allocation failed\n");
 		return -1;
 	}
 
@@ -1712,9 +1735,14 @@ static int aac_write(struct scsi_cmnd * 
 	 *	Allocate and initialize a Fib then setup a BlockWrite command
 	 */
 	if (!(cmd_fibcontext = aac_fib_alloc(dev))) {
-		scsicmd->result = DID_ERROR << 16;
-		scsicmd->scsi_done(scsicmd);
-		return 0;
+		/* FIB temporarily unavailable,not catastrophic failure */
+
+		/* scsicmd->result = DID_ERROR << 16;
+		 * scsicmd->scsi_done(scsicmd);
+		 * return 0;
+		 */
+		printk(KERN_WARNING "aac_write: fib allocation failed\n");
+		return -1;
 	}
 
 	status = aac_adapter_write(cmd_fibcontext, scsicmd, lba, count, fua);
diff -purN a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
--- a/drivers/scsi/aacraid/aacraid.h	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/aacraid.h	2009-09-25 12:13:55.000000000 +0530
@@ -12,7 +12,7 @@
  *----------------------------------------------------------------------------*/
 
 #ifndef AAC_DRIVER_BUILD
-# define AAC_DRIVER_BUILD 2461
+# define AAC_DRIVER_BUILD 24701
 # define AAC_DRIVER_BRANCH "-ms"
 #endif
 #define MAXIMUM_NUM_CONTAINERS	32
@@ -1036,6 +1036,9 @@ struct aac_dev
 	u8			printf_enabled;
 	u8			in_reset;
 	u8			msi;
+	int			management_fib_count;
+	spinlock_t		manage_lock;
+
 };
 
 #define aac_adapter_interrupt(dev) \
diff -purN a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
--- a/drivers/scsi/aacraid/commctrl.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/commctrl.c	2009-09-25 12:07:44.000000000 +0530
@@ -153,7 +153,7 @@ cleanup:
 		fibptr->hw_fib_pa = hw_fib_pa;
 		fibptr->hw_fib_va = hw_fib;
 	}
-	if (retval != -EINTR)
+	if (retval != -ERESTARTSYS)
 		aac_fib_free(fibptr);
 	return retval;
 }
@@ -322,7 +322,7 @@ return_fib:
 		}
 		if (f.wait) {
 			if(down_interruptible(&fibctx->wait_sem) < 0) {
-				status = -EINTR;
+				status = -ERESTARTSYS;
 			} else {
 				/* Lock again and retry */
 				spin_lock_irqsave(&dev->fib_lock, flags);
@@ -593,10 +593,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -645,10 +645,10 @@ static int aac_send_raw_srb(struct aac_d
 				u64 addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -695,10 +695,10 @@ static int aac_send_raw_srb(struct aac_d
 				uintptr_t addr;
 				void* p;
 				if (usg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -734,10 +734,10 @@ static int aac_send_raw_srb(struct aac_d
 				dma_addr_t addr;
 				void* p;
 				if (upsg->sg[i].count >
-				    (dev->adapter_info.options &
+				    ((dev->adapter_info.options &
 				     AAC_OPT_NEW_COMM) ?
 				      (dev->scsi_host_ptr->max_sectors << 9) :
-				      65536) {
+				      65536)) {
 					rcode = -EINVAL;
 					goto cleanup;
 				}
@@ -772,8 +772,8 @@ static int aac_send_raw_srb(struct aac_d
 		psg->count = cpu_to_le32(sg_indx+1);
 		status = aac_fib_send(ScsiPortCommand, srbfib, actual_fibsize, FsaNormal, 1, 1, NULL, NULL);
 	}
-	if (status == -EINTR) {
-		rcode = -EINTR;
+	if (status == -ERESTARTSYS) {
+		rcode = -ERESTARTSYS;
 		goto cleanup;
 	}
 
@@ -810,7 +810,7 @@ cleanup:
 	for(i=0; i <= sg_indx; i++){
 		kfree(sg_list[i]);
 	}
-	if (rcode != -EINTR) {
+	if (rcode != -ERESTARTSYS) {
 		aac_fib_complete(srbfib);
 		aac_fib_free(srbfib);
 	}
@@ -842,13 +842,22 @@ static int aac_get_pci_info(struct aac_d
 int aac_do_ioctl(struct aac_dev * dev, int cmd, void __user *arg)
 {
 	int status;
+	unsigned long mflags;
 
 	/*
 	 *	HBA gets first crack
 	 */
 
+	spin_lock_irqsave(&dev->manage_lock, mflags);
+	if (dev->management_fib_count > AAC_NUM_MGT_FIB) {
+		printk(KERN_INFO "No management Fibs Available:%d\n",
+						dev->management_fib_count);
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
+		return -EBUSY;
+	}
+	spin_unlock_irqrestore(&dev->manage_lock, mflags);
 	status = aac_dev_ioctl(dev, cmd, arg);
-	if(status != -ENOTTY)
+	if (status != -ENOTTY)
 		return status;
 
 	switch (cmd) {
diff -purN a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
--- a/drivers/scsi/aacraid/comminit.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/comminit.c	2009-09-25 11:54:23.000000000 +0530
@@ -194,7 +194,9 @@ int aac_send_shutdown(struct aac_dev * d
 
 	if (status >= 0)
 		aac_fib_complete(fibctx);
-	aac_fib_free(fibctx);
+	/* FIB should be freed only after getting the response from the F/W */
+	if (status != -ERESTARTSYS)
+		aac_fib_free(fibctx);
 	return status;
 }
 
@@ -304,6 +306,8 @@ struct aac_dev *aac_init_adapter(struct 
 	/*
 	 *	Check the preferred comm settings, defaults from template.
 	 */
+	dev->management_fib_count = 0;
+	spin_lock_init(&dev->manage_lock);
 	dev->max_fib_size = sizeof(struct hw_fib);
 	dev->sg_tablesize = host->sg_tablesize = (dev->max_fib_size
 		- sizeof(struct aac_fibhdr)
diff -purN a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
--- a/drivers/scsi/aacraid/commsup.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/commsup.c	2009-09-25 11:57:00.000000000 +0530
@@ -189,7 +189,14 @@ struct fib *aac_fib_alloc(struct aac_dev
 
 void aac_fib_free(struct fib *fibptr)
 {
-	unsigned long flags;
+	unsigned long flags, flagsv;
+
+	spin_lock_irqsave(&fibptr->event_lock, flagsv);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
+		return;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flagsv);
 
 	spin_lock_irqsave(&fibptr->dev->fib_lock, flags);
 	if (unlikely(fibptr->flags & FIB_CONTEXT_FLAG_TIMED_OUT))
@@ -473,14 +480,27 @@ int aac_fib_send(u16 command, struct fib
 
 	if(wait)
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-	aac_adapter_deliver(fibptr);
+
+	if (aac_adapter_deliver(fibptr) != 0) {
+		printk(KERN_ERR "aac_fib_send: returned -EBUSY\n");
+		if (wait)
+			spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return -EBUSY;
+	}
+
 
 	/*
 	 *	If the caller wanted us to wait for response wait now.
 	 */
 
 	if (wait) {
+		unsigned long mflags;
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
+		spin_lock_irqsave(&dev->manage_lock, mflags);
+		dev->management_fib_count++;
+		spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 		/* Only set for first known interruptable command */
 		if (wait < 0) {
 			/*
@@ -515,15 +535,14 @@ int aac_fib_send(u16 command, struct fib
 				}
 				udelay(5);
 			}
-		} else if (down_interruptible(&fibptr->event_wait)) {
-			fibptr->done = 2;
-			up(&fibptr->event_wait);
-		}
+		} else
+			down_interruptible(&fibptr->event_wait);
+
 		spin_lock_irqsave(&fibptr->event_lock, flags);
-		if ((fibptr->done == 0) || (fibptr->done == 2)) {
+		if (fibptr->done == 0) {
 			fibptr->done = 2; /* Tell interrupt we aborted */
 			spin_unlock_irqrestore(&fibptr->event_lock, flags);
-			return -EINTR;
+			return -ERESTARTSYS;
 		}
 		spin_unlock_irqrestore(&fibptr->event_lock, flags);
 		BUG_ON(fibptr->done == 0);
@@ -689,6 +708,7 @@ int aac_fib_adapter_complete(struct fib 
 
 int aac_fib_complete(struct fib *fibptr)
 {
+	unsigned long flags;
 	struct hw_fib * hw_fib = fibptr->hw_fib_va;
 
 	/*
@@ -709,6 +729,13 @@ int aac_fib_complete(struct fib *fibptr)
 	 *	command is complete that we had sent to the adapter and this
 	 *	cdb could be reused.
 	 */
+	spin_lock_irqsave(&fibptr->event_lock, flags);
+	if (fibptr->done == 2) {
+		spin_unlock_irqrestore(&fibptr->event_lock, flags);
+		return 0;
+	}
+	spin_unlock_irqrestore(&fibptr->event_lock, flags);
+
 	if((hw_fib->header.XferState & cpu_to_le32(SentFromHost)) &&
 		(hw_fib->header.XferState & cpu_to_le32(AdapterProcessed)))
 	{
@@ -1355,7 +1382,10 @@ int aac_reset_adapter(struct aac_dev * a
 
 			if (status >= 0)
 				aac_fib_complete(fibctx);
-			aac_fib_free(fibctx);
+			/* FIB should be freed only after getting
+			 * the response from the F/W */
+			if (status != -ERESTARTSYS)
+				aac_fib_free(fibctx);
 		}
 	}
 
@@ -1759,6 +1789,7 @@ int aac_command_thread(void *data)
 				struct fib *fibptr;
 
 				if ((fibptr = aac_fib_alloc(dev))) {
+					int status;
 					__le32 *info;
 
 					aac_fib_init(fibptr);
@@ -1769,15 +1800,21 @@ int aac_command_thread(void *data)
 
 					*info = cpu_to_le32(now.tv_sec);
 
-					(void)aac_fib_send(SendHostTime,
+					status = aac_fib_send(SendHostTime,
 						fibptr,
 						sizeof(*info),
 						FsaNormal,
 						1, 1,
 						NULL,
 						NULL);
-					aac_fib_complete(fibptr);
-					aac_fib_free(fibptr);
+					/* Do not set XferState to zero unless
+					 * receives a response from F/W */
+					if (status >= 0)
+						aac_fib_complete(fibptr);
+					/* FIB should be freed only after
+					 * getting the response from the F/W */
+					if (status != -ERESTARTSYS)
+						aac_fib_free(fibptr);
 				}
 				difference = (long)(unsigned)update_interval*HZ;
 			} else {
diff -purN a/drivers/scsi/aacraid/dpcsup.c b/drivers/scsi/aacraid/dpcsup.c
--- a/drivers/scsi/aacraid/dpcsup.c	2009-09-10 03:43:59.000000000 +0530
+++ b/drivers/scsi/aacraid/dpcsup.c	2009-09-25 12:08:44.000000000 +0530
@@ -57,9 +57,9 @@ unsigned int aac_response_normal(struct 
 	struct hw_fib * hwfib;
 	struct fib * fib;
 	int consumed = 0;
-	unsigned long flags;
+	unsigned long flags, mflags;
 
-	spin_lock_irqsave(q->lock, flags);	
+	spin_lock_irqsave(q->lock, flags);
 	/*
 	 *	Keep pulling response QEs off the response queue and waking
 	 *	up the waiters until there are no more QEs. We then return
@@ -125,12 +125,21 @@ unsigned int aac_response_normal(struct 
 		} else {
 			unsigned long flagv;
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
 			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
 				aac_fib_complete(fib);
 				aac_fib_free(fib);
 			}
@@ -232,6 +241,7 @@ unsigned int aac_command_normal(struct a
 
 unsigned int aac_intr_normal(struct aac_dev * dev, u32 index)
 {
+	unsigned long mflags;
 	dprintk((KERN_INFO "aac_intr_normal(%p,%x)\n", dev, index));
 	if ((index & 0x00000002L)) {
 		struct hw_fib * hw_fib;
@@ -320,11 +330,25 @@ unsigned int aac_intr_normal(struct aac_
 			unsigned long flagv;
 	  		dprintk((KERN_INFO "event_wait up\n"));
 			spin_lock_irqsave(&fib->event_lock, flagv);
-			if (!fib->done)
+			if (!fib->done) {
 				fib->done = 1;
-			up(&fib->event_wait);
+				up(&fib->event_wait);
+			}
 			spin_unlock_irqrestore(&fib->event_lock, flagv);
+
+			spin_lock_irqsave(&dev->manage_lock, mflags);
+			dev->management_fib_count--;
+			spin_unlock_irqrestore(&dev->manage_lock, mflags);
+
 			FIB_COUNTER_INCREMENT(aac_config.NormalRecved);
+			if (fib->done == 2) {
+				spin_lock_irqsave(&fib->event_lock, flagv);
+				fib->done = 0;
+				spin_unlock_irqrestore(&fib->event_lock, flagv);
+				aac_fib_complete(fib);
+				aac_fib_free(fib);
+			}
+
 		}
 		return 0;
 	}

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2009-12-24  6:31 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-05 13:27 [PATCH ] scsi-misc-2.6: File System going into read-only mode Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-11 19:26 ` James Bottomley
2009-11-12  8:28   ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
2009-11-13 20:55     ` James Bottomley
2009-11-16  3:25       ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
2009-11-16 15:51         ` James Bottomley
2009-11-17  5:22           ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
2009-11-17  5:27           ` Penchala Narasimha Reddy Chilakala, ERS-HCLTech
  -- strict thread matches above, loose matches on Subject: below --
2009-12-24  6:29 Penchala Narasimha Reddy Chilakala, ERS-HCLTech
2009-12-21 13:09 Penchala Narasimha Reddy Chilakala, ERS-HCLTech
2009-12-18 12:25 Penchala Narasimha Reddy Chilakala, ERS-HCLTech
2009-11-02  7:35 Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-02 16:30 ` Stefan Richter
2009-11-03  9:54   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-03 19:02     ` Stefan Richter
2009-11-04  7:00       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-04  9:18       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-04 14:48         ` Stefan Richter
2009-11-02 17:45 ` James Bottomley
2009-11-03  9:38   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-03 15:08     ` James Bottomley
2009-11-04  7:07       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-04 15:42         ` James Bottomley
2009-10-09  9:23 Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-10-22  1:09 ` James Bottomley
2009-10-23 13:10   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-09-29  8:47 Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-09-29 14:19 ` James Bottomley
2009-09-30 11:09   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-09-30 11:33     ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-10-07  6:02       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-02 17:46 ` James Bottomley
2009-11-03 10:17   ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-03 14:40     ` James Bottomley
2009-11-04  1:23       ` Stefan Richter
2009-11-04  1:33         ` Stefan Richter
2009-11-04  1:40           ` Stefan Richter
2009-11-04  6:44         ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-04  7:21       ` Penchala Narasimha Reddy Chilakala, TLS-Chennai
2009-11-04 15:30         ` James Bottomley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.