All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix double free of MPT request frames.
@ 2009-05-26 19:25 Alok Kataria
  2009-05-27 22:54 ` Alok Kataria
  0 siblings, 1 reply; 7+ messages in thread
From: Alok Kataria @ 2009-05-26 19:25 UTC (permalink / raw)
  To: Eric.Moore; +Cc: DL-MPTFusionLinux, linux-scsi

Hi, 

While testing scsi path failover for disks using MPT drivers we hit a
kernel oops on RHEL 5.1-64bit, while analyzing the problem we noticed
that this is due to a race present in the mpt scsi code path. The same
race seems to be present in the latest git kernel code too.

Here is the description of the race,
If a command has run for too long - which can happen often in the
failover case - the SCSI stack decides to abort that command and invokes
a device's abort handler. 
The code flow is...

mptscsih_abort ==> mptscsih_TMHandle (ABORT_TASK) ==>
mptscsih_IssueTaskMgmt (submits the command ) ==>
mptscsih_tm_wait_for_completion()

the mptscsih_tm_wait_for_completion code looks like this

do {
                spin_lock_irqsave(&ioc->FreeQlock, flags);
                if(hd->tmPending == 0) {
                        status = SUCCESS;
                        spin_unlock_irqrestore(&ioc->FreeQlock, flags);
                        break;
                }
                spin_unlock_irqrestore(&ioc->FreeQlock, flags);
                msleep(250);                        <<==================
        } while (--loop_count);

where its looping on hd->tmPending (SUCCESS) or until timeout expires
(FAILURE).

Every thing works fine in the success case, in the failure case we
return to mptscsih_IssueTaskMgmt, and call mpt_HardResetHandler, where
we throw all inflight commands (hard reset) and free request via
mpt_free_msg_frame. Which also seems fine.

Now notice in the loop above that in the last iteration we sleep for
250ms after the last read to tmPending, the TM command could complete
between this check of tmPending and before the adapter is hard reset. 
In such a case the request is double freed once when the command
completes (interrupt gets delivered and mpt_reply releases request
frame), and once after the mpt_HardResetHandler. This results in
corruption of  the linked list of empty request frames and causes a oops
on next list access.

The patch below attempts to fix this race by freeing the request_frame
before mpt_HardReset and doing that atomically after checking for
tmPending value. Also the mptscsih_taskmgmt_complete function returns
"1" irrespective of the tmPending value which results in always freeing
of the request frame from mpt_reply, so I have modified that function to
look at the tmPending value before returning. 

While I am at it, I have also added a BUG_ON statement in
mpt_free_msg_frame to check for double free, instead of silently
corrupting the list.

Please consider the patch below for inclusion.


Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: <stable@kernel.org>
---

 drivers/message/fusion/mptbase.c  |   11 ++++-------
 drivers/message/fusion/mptbase.h  |   10 +++++++++-
 drivers/message/fusion/mptscsih.c |   25 +++++++++++++++++++++----
 3 files changed, 34 insertions(+), 12 deletions(-)


diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
index 5d496a9..a1637bf 100644
--- a/drivers/message/fusion/mptbase.c
+++ b/drivers/message/fusion/mptbase.c
@@ -974,7 +974,7 @@ mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
 
 /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
 /**
- *	mpt_free_msg_frame - Place MPT request frame back on FreeQ.
+ *	__mpt_free_msg_frame - Place MPT request frame back on FreeQ.
  *	@ioc: Pointer to MPT adapter structure
  *	@mf: Pointer to MPT request frame
  *
@@ -982,18 +982,15 @@ mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
  *	FreeQ.
  */
 void
-mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
+__mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
 {
-	unsigned long flags;
-
+	BUG_ON(mf->u.frame.linkage.arg1 == 0xdeadbeaf);
 	/*  Put Request back on FreeQ!  */
-	spin_lock_irqsave(&ioc->FreeQlock, flags);
 	mf->u.frame.linkage.arg1 = 0xdeadbeaf; /* signature to know if this mf is freed */
 	list_add_tail(&mf->u.frame.linkage.list, &ioc->FreeQ);
 #ifdef MFCNT
 	ioc->mfcnt--;
 #endif
-	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
 }
 
 /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
@@ -7612,7 +7609,7 @@ EXPORT_SYMBOL(mpt_device_driver_deregister);
 EXPORT_SYMBOL(mpt_get_msg_frame);
 EXPORT_SYMBOL(mpt_put_msg_frame);
 EXPORT_SYMBOL(mpt_put_msg_frame_hi_pri);
-EXPORT_SYMBOL(mpt_free_msg_frame);
+EXPORT_SYMBOL(__mpt_free_msg_frame);
 EXPORT_SYMBOL(mpt_add_sge);
 EXPORT_SYMBOL(mpt_send_handshake_request);
 EXPORT_SYMBOL(mpt_verify_adapter);
diff --git a/drivers/message/fusion/mptbase.h b/drivers/message/fusion/mptbase.h
index b3e981d..4e4ee53 100644
--- a/drivers/message/fusion/mptbase.h
+++ b/drivers/message/fusion/mptbase.h
@@ -906,7 +906,7 @@ extern void	 mpt_reset_deregister(u8 cb_idx);
 extern int	 mpt_device_driver_register(struct mpt_pci_driver * dd_cbfunc, u8 cb_idx);
 extern void	 mpt_device_driver_deregister(u8 cb_idx);
 extern MPT_FRAME_HDR	*mpt_get_msg_frame(u8 cb_idx, MPT_ADAPTER *ioc);
-extern void	 mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
+extern void	 __mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
 extern void	 mpt_put_msg_frame(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
 extern void	 mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
 extern void	 mpt_add_sge(char *pAddr, u32 flagslength, dma_addr_t dma_addr);
@@ -924,6 +924,14 @@ extern int	 mptbase_sas_persist_operation(MPT_ADAPTER *ioc, u8 persist_opcode);
 extern int	 mpt_raid_phys_disk_pg0(MPT_ADAPTER *ioc, u8 phys_disk_num, pRaidPhysDiskPage0_t phys_disk);
 extern void     mpt_halt_firmware(MPT_ADAPTER *ioc);
 
+static inline void
+mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
+{
+	unsigned long flags;
+	spin_lock_irqsave(&ioc->FreeQlock, flags);
+	__mpt_free_msg_frame(ioc, mf);
+	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
+}
 
 /*
  *  Public data decl's...
diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c
index e62c6bc..afe7fc0 100644
--- a/drivers/message/fusion/mptscsih.c
+++ b/drivers/message/fusion/mptscsih.c
@@ -1719,6 +1719,18 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 type, u8 channel, u8 id, int lun, i
 	}
 
 	if(mptscsih_tm_wait_for_completion(hd, timeout) == FAILED) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&ioc->FreeQlock, flags);
+		if(hd->tmPending == 0) {
+                        spin_unlock_irqrestore(&ioc->FreeQlock, flags);
+                        goto out_success;
+                }
+		__mpt_free_msg_frame(ioc, mf);
+		hd->tmPending = 0;
+		hd->tmState = TM_STATE_NONE;
+		spin_unlock_irqrestore(&ioc->FreeQlock, flags);
+
 		dfailprintk(ioc, printk(MYIOC_s_ERR_FMT "task management request TIMED OUT!"
 			" (hd %p, ioc %p, mf %p) \n", ioc->name, hd,
 			ioc, mf));
@@ -1727,9 +1739,10 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 type, u8 channel, u8 id, int lun, i
 		retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
 		dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "rc=%d \n",
 			 ioc->name, retval));
-		goto fail_out;
+		return FAILED;
 	}
 
+out_success:
 	/*
 	 * Handle success case, see if theres a non-zero ioc_status.
 	 */
@@ -2159,6 +2172,7 @@ mptscsih_taskmgmt_complete(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf, MPT_FRAME_HDR *m
 	u16			 iocstatus;
 	u8			 tmType;
 	u32			 termination_count;
+	int			 retval = 1;
 
 	dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "TaskMgmt completed (mf=%p,mr=%p)\n",
 	    ioc->name, mf, mr));
@@ -2235,12 +2249,15 @@ mptscsih_taskmgmt_complete(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf, MPT_FRAME_HDR *m
 
  out:
 	spin_lock_irqsave(&ioc->FreeQlock, flags);
-	hd->tmPending = 0;
-	hd->tmState = TM_STATE_NONE;
+	if (hd->tmPending) {
+		hd->tmPending = 0;
+		hd->tmState = TM_STATE_NONE;
+	} else
+		retval = 0;
 	hd->tm_iocstatus = iocstatus;
 	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
 
-	return 1;
+	return retval;
 }
 
 /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fix double free of MPT request frames.
  2009-05-26 19:25 [PATCH] Fix double free of MPT request frames Alok Kataria
@ 2009-05-27 22:54 ` Alok Kataria
  2009-05-30  5:56   ` Desai, Kashyap
  0 siblings, 1 reply; 7+ messages in thread
From: Alok Kataria @ 2009-05-27 22:54 UTC (permalink / raw)
  To: Eric.Moore, James.Bottomley; +Cc: DL-MPTFusionLinux, linux-scsi

Ccing James...any comments on the patch below ? 

Thanks,
Alok

On Tue, 2009-05-26 at 12:25 -0700, Alok Kataria wrote:
> Hi, 
> 
> While testing scsi path failover for disks using MPT drivers we hit a
> kernel oops on RHEL 5.1-64bit, while analyzing the problem we noticed
> that this is due to a race present in the mpt scsi code path. The same
> race seems to be present in the latest git kernel code too.
> 
> Here is the description of the race,
> If a command has run for too long - which can happen often in the
> failover case - the SCSI stack decides to abort that command and invokes
> a device's abort handler. 
> The code flow is...
> 
> mptscsih_abort ==> mptscsih_TMHandle (ABORT_TASK) ==>
> mptscsih_IssueTaskMgmt (submits the command ) ==>
> mptscsih_tm_wait_for_completion()
> 
> the mptscsih_tm_wait_for_completion code looks like this
> 
> do {
>                 spin_lock_irqsave(&ioc->FreeQlock, flags);
>                 if(hd->tmPending == 0) {
>                         status = SUCCESS;
>                         spin_unlock_irqrestore(&ioc->FreeQlock, flags);
>                         break;
>                 }
>                 spin_unlock_irqrestore(&ioc->FreeQlock, flags);
>                 msleep(250);                        <<==================
>         } while (--loop_count);
> 
> where its looping on hd->tmPending (SUCCESS) or until timeout expires
> (FAILURE).
> 
> Every thing works fine in the success case, in the failure case we
> return to mptscsih_IssueTaskMgmt, and call mpt_HardResetHandler, where
> we throw all inflight commands (hard reset) and free request via
> mpt_free_msg_frame. Which also seems fine.
> 
> Now notice in the loop above that in the last iteration we sleep for
> 250ms after the last read to tmPending, the TM command could complete
> between this check of tmPending and before the adapter is hard reset. 
> In such a case the request is double freed once when the command
> completes (interrupt gets delivered and mpt_reply releases request
> frame), and once after the mpt_HardResetHandler. This results in
> corruption of  the linked list of empty request frames and causes a oops
> on next list access.
> 
> The patch below attempts to fix this race by freeing the request_frame
> before mpt_HardReset and doing that atomically after checking for
> tmPending value. Also the mptscsih_taskmgmt_complete function returns
> "1" irrespective of the tmPending value which results in always freeing
> of the request frame from mpt_reply, so I have modified that function to
> look at the tmPending value before returning. 
> 
> While I am at it, I have also added a BUG_ON statement in
> mpt_free_msg_frame to check for double free, instead of silently
> corrupting the list.
> 
> Please consider the patch below for inclusion.
> 
> 
> Signed-off-by: Alok N Kataria <akataria@vmware.com>
> Cc: <stable@kernel.org>
> ---
> 
>  drivers/message/fusion/mptbase.c  |   11 ++++-------
>  drivers/message/fusion/mptbase.h  |   10 +++++++++-
>  drivers/message/fusion/mptscsih.c |   25 +++++++++++++++++++++----
>  3 files changed, 34 insertions(+), 12 deletions(-)
> 
> 
> diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
> index 5d496a9..a1637bf 100644
> --- a/drivers/message/fusion/mptbase.c
> +++ b/drivers/message/fusion/mptbase.c
> @@ -974,7 +974,7 @@ mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
>  
>  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
>  /**
> - *	mpt_free_msg_frame - Place MPT request frame back on FreeQ.
> + *	__mpt_free_msg_frame - Place MPT request frame back on FreeQ.
>   *	@ioc: Pointer to MPT adapter structure
>   *	@mf: Pointer to MPT request frame
>   *
> @@ -982,18 +982,15 @@ mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
>   *	FreeQ.
>   */
>  void
> -mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> +__mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
>  {
> -	unsigned long flags;
> -
> +	BUG_ON(mf->u.frame.linkage.arg1 == 0xdeadbeaf);
>  	/*  Put Request back on FreeQ!  */
> -	spin_lock_irqsave(&ioc->FreeQlock, flags);
>  	mf->u.frame.linkage.arg1 = 0xdeadbeaf; /* signature to know if this mf is freed */
>  	list_add_tail(&mf->u.frame.linkage.list, &ioc->FreeQ);
>  #ifdef MFCNT
>  	ioc->mfcnt--;
>  #endif
> -	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
>  }
>  
>  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
> @@ -7612,7 +7609,7 @@ EXPORT_SYMBOL(mpt_device_driver_deregister);
>  EXPORT_SYMBOL(mpt_get_msg_frame);
>  EXPORT_SYMBOL(mpt_put_msg_frame);
>  EXPORT_SYMBOL(mpt_put_msg_frame_hi_pri);
> -EXPORT_SYMBOL(mpt_free_msg_frame);
> +EXPORT_SYMBOL(__mpt_free_msg_frame);
>  EXPORT_SYMBOL(mpt_add_sge);
>  EXPORT_SYMBOL(mpt_send_handshake_request);
>  EXPORT_SYMBOL(mpt_verify_adapter);
> diff --git a/drivers/message/fusion/mptbase.h b/drivers/message/fusion/mptbase.h
> index b3e981d..4e4ee53 100644
> --- a/drivers/message/fusion/mptbase.h
> +++ b/drivers/message/fusion/mptbase.h
> @@ -906,7 +906,7 @@ extern void	 mpt_reset_deregister(u8 cb_idx);
>  extern int	 mpt_device_driver_register(struct mpt_pci_driver * dd_cbfunc, u8 cb_idx);
>  extern void	 mpt_device_driver_deregister(u8 cb_idx);
>  extern MPT_FRAME_HDR	*mpt_get_msg_frame(u8 cb_idx, MPT_ADAPTER *ioc);
> -extern void	 mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
> +extern void	 __mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
>  extern void	 mpt_put_msg_frame(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
>  extern void	 mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
>  extern void	 mpt_add_sge(char *pAddr, u32 flagslength, dma_addr_t dma_addr);
> @@ -924,6 +924,14 @@ extern int	 mptbase_sas_persist_operation(MPT_ADAPTER *ioc, u8 persist_opcode);
>  extern int	 mpt_raid_phys_disk_pg0(MPT_ADAPTER *ioc, u8 phys_disk_num, pRaidPhysDiskPage0_t phys_disk);
>  extern void     mpt_halt_firmware(MPT_ADAPTER *ioc);
>  
> +static inline void
> +mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> +{
> +	unsigned long flags;
> +	spin_lock_irqsave(&ioc->FreeQlock, flags);
> +	__mpt_free_msg_frame(ioc, mf);
> +	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> +}
>  
>  /*
>   *  Public data decl's...
> diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c
> index e62c6bc..afe7fc0 100644
> --- a/drivers/message/fusion/mptscsih.c
> +++ b/drivers/message/fusion/mptscsih.c
> @@ -1719,6 +1719,18 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 type, u8 channel, u8 id, int lun, i
>  	}
>  
>  	if(mptscsih_tm_wait_for_completion(hd, timeout) == FAILED) {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(&ioc->FreeQlock, flags);
> +		if(hd->tmPending == 0) {
> +                        spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> +                        goto out_success;
> +                }
> +		__mpt_free_msg_frame(ioc, mf);
> +		hd->tmPending = 0;
> +		hd->tmState = TM_STATE_NONE;
> +		spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> +
>  		dfailprintk(ioc, printk(MYIOC_s_ERR_FMT "task management request TIMED OUT!"
>  			" (hd %p, ioc %p, mf %p) \n", ioc->name, hd,
>  			ioc, mf));
> @@ -1727,9 +1739,10 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 type, u8 channel, u8 id, int lun, i
>  		retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
>  		dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "rc=%d \n",
>  			 ioc->name, retval));
> -		goto fail_out;
> +		return FAILED;
>  	}
>  
> +out_success:
>  	/*
>  	 * Handle success case, see if theres a non-zero ioc_status.
>  	 */
> @@ -2159,6 +2172,7 @@ mptscsih_taskmgmt_complete(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf, MPT_FRAME_HDR *m
>  	u16			 iocstatus;
>  	u8			 tmType;
>  	u32			 termination_count;
> +	int			 retval = 1;
>  
>  	dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "TaskMgmt completed (mf=%p,mr=%p)\n",
>  	    ioc->name, mf, mr));
> @@ -2235,12 +2249,15 @@ mptscsih_taskmgmt_complete(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf, MPT_FRAME_HDR *m
>  
>   out:
>  	spin_lock_irqsave(&ioc->FreeQlock, flags);
> -	hd->tmPending = 0;
> -	hd->tmState = TM_STATE_NONE;
> +	if (hd->tmPending) {
> +		hd->tmPending = 0;
> +		hd->tmState = TM_STATE_NONE;
> +	} else
> +		retval = 0;
>  	hd->tm_iocstatus = iocstatus;
>  	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
>  
> -	return 1;
> +	return retval;
>  }
>  
>  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] Fix double free of MPT request frames.
  2009-05-27 22:54 ` Alok Kataria
@ 2009-05-30  5:56   ` Desai, Kashyap
  2009-05-30 18:34     ` Alok Kataria
  0 siblings, 1 reply; 7+ messages in thread
From: Desai, Kashyap @ 2009-05-30  5:56 UTC (permalink / raw)
  To: akataria, Moore, Eric, James.Bottomley
  Cc: DL-MPT Fusion Linux, linux-scsi, Prakash, Sathya

Alok,

This part of code is obsolete in LSI's new driver. 
Very soon you will not see " mptscsih_tm_wait_for_completion" fuction.
It is replaced by "wait_for_completion_timeout()".

I will say for your kernel panic due to double free in msg frame link list can be avoided by changing " mpt_free_msg_frame". See snippet of code below.
------------------------------------------------------------------
void
mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
{
	unsigned long flags;

	/*  Put Request back on FreeQ!  */
	spin_lock_irqsave(&ioc->FreeQlock, flags);
	if (cpu_to_le32(mf->u.frame.linkage.arg1) == 0xdeadbeaf)  <----------
		goto out;
	/* signature to know if this mf is freed */
	mf->u.frame.linkage.arg1 = cpu_to_le32(0xdeadbeaf);
	list_add_tail(&mf->u.frame.linkage.list, &ioc->FreeQ);
out:
	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
}
----------------------------------------------------------------


This is how we are doing in LSI's Inbox driver. 


Thanks,
Kashyap

-----Original Message-----
From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Alok Kataria
Sent: Thursday, May 28, 2009 4:25 AM
To: Moore, Eric; James.Bottomley@HansenPartnership.com
Cc: DL-MPT Fusion Linux; linux-scsi@vger.kernel.org
Subject: Re: [PATCH] Fix double free of MPT request frames.

Ccing James...any comments on the patch below ? 

Thanks,
Alok

On Tue, 2009-05-26 at 12:25 -0700, Alok Kataria wrote:
> Hi, 
> 
> While testing scsi path failover for disks using MPT drivers we hit a
> kernel oops on RHEL 5.1-64bit, while analyzing the problem we noticed
> that this is due to a race present in the mpt scsi code path. The same
> race seems to be present in the latest git kernel code too.
> 
> Here is the description of the race,
> If a command has run for too long - which can happen often in the
> failover case - the SCSI stack decides to abort that command and invokes
> a device's abort handler. 
> The code flow is...
> 
> mptscsih_abort ==> mptscsih_TMHandle (ABORT_TASK) ==>
> mptscsih_IssueTaskMgmt (submits the command ) ==>
> mptscsih_tm_wait_for_completion()
> 
> the mptscsih_tm_wait_for_completion code looks like this
> 
> do {
>                 spin_lock_irqsave(&ioc->FreeQlock, flags);
>                 if(hd->tmPending == 0) {
>                         status = SUCCESS;
>                         spin_unlock_irqrestore(&ioc->FreeQlock, flags);
>                         break;
>                 }
>                 spin_unlock_irqrestore(&ioc->FreeQlock, flags);
>                 msleep(250);                        <<==================
>         } while (--loop_count);
> 
> where its looping on hd->tmPending (SUCCESS) or until timeout expires
> (FAILURE).
> 
> Every thing works fine in the success case, in the failure case we
> return to mptscsih_IssueTaskMgmt, and call mpt_HardResetHandler, where
> we throw all inflight commands (hard reset) and free request via
> mpt_free_msg_frame. Which also seems fine.
> 
> Now notice in the loop above that in the last iteration we sleep for
> 250ms after the last read to tmPending, the TM command could complete
> between this check of tmPending and before the adapter is hard reset. 
> In such a case the request is double freed once when the command
> completes (interrupt gets delivered and mpt_reply releases request
> frame), and once after the mpt_HardResetHandler. This results in
> corruption of  the linked list of empty request frames and causes a oops
> on next list access.
> 
> The patch below attempts to fix this race by freeing the request_frame
> before mpt_HardReset and doing that atomically after checking for
> tmPending value. Also the mptscsih_taskmgmt_complete function returns
> "1" irrespective of the tmPending value which results in always freeing
> of the request frame from mpt_reply, so I have modified that function to
> look at the tmPending value before returning. 
> 
> While I am at it, I have also added a BUG_ON statement in
> mpt_free_msg_frame to check for double free, instead of silently
> corrupting the list.
> 
> Please consider the patch below for inclusion.
> 
> 
> Signed-off-by: Alok N Kataria <akataria@vmware.com>
> Cc: <stable@kernel.org>
> ---
> 
>  drivers/message/fusion/mptbase.c  |   11 ++++-------
>  drivers/message/fusion/mptbase.h  |   10 +++++++++-
>  drivers/message/fusion/mptscsih.c |   25 +++++++++++++++++++++----
>  3 files changed, 34 insertions(+), 12 deletions(-)
> 
> 
> diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
> index 5d496a9..a1637bf 100644
> --- a/drivers/message/fusion/mptbase.c
> +++ b/drivers/message/fusion/mptbase.c
> @@ -974,7 +974,7 @@ mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
>  
>  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
>  /**
> - *	mpt_free_msg_frame - Place MPT request frame back on FreeQ.
> + *	__mpt_free_msg_frame - Place MPT request frame back on FreeQ.
>   *	@ioc: Pointer to MPT adapter structure
>   *	@mf: Pointer to MPT request frame
>   *
> @@ -982,18 +982,15 @@ mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
>   *	FreeQ.
>   */
>  void
> -mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> +__mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
>  {
> -	unsigned long flags;
> -
> +	BUG_ON(mf->u.frame.linkage.arg1 == 0xdeadbeaf);
>  	/*  Put Request back on FreeQ!  */
> -	spin_lock_irqsave(&ioc->FreeQlock, flags);
>  	mf->u.frame.linkage.arg1 = 0xdeadbeaf; /* signature to know if this mf is freed */
>  	list_add_tail(&mf->u.frame.linkage.list, &ioc->FreeQ);
>  #ifdef MFCNT
>  	ioc->mfcnt--;
>  #endif
> -	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
>  }
>  
>  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
> @@ -7612,7 +7609,7 @@ EXPORT_SYMBOL(mpt_device_driver_deregister);
>  EXPORT_SYMBOL(mpt_get_msg_frame);
>  EXPORT_SYMBOL(mpt_put_msg_frame);
>  EXPORT_SYMBOL(mpt_put_msg_frame_hi_pri);
> -EXPORT_SYMBOL(mpt_free_msg_frame);
> +EXPORT_SYMBOL(__mpt_free_msg_frame);
>  EXPORT_SYMBOL(mpt_add_sge);
>  EXPORT_SYMBOL(mpt_send_handshake_request);
>  EXPORT_SYMBOL(mpt_verify_adapter);
> diff --git a/drivers/message/fusion/mptbase.h b/drivers/message/fusion/mptbase.h
> index b3e981d..4e4ee53 100644
> --- a/drivers/message/fusion/mptbase.h
> +++ b/drivers/message/fusion/mptbase.h
> @@ -906,7 +906,7 @@ extern void	 mpt_reset_deregister(u8 cb_idx);
>  extern int	 mpt_device_driver_register(struct mpt_pci_driver * dd_cbfunc, u8 cb_idx);
>  extern void	 mpt_device_driver_deregister(u8 cb_idx);
>  extern MPT_FRAME_HDR	*mpt_get_msg_frame(u8 cb_idx, MPT_ADAPTER *ioc);
> -extern void	 mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
> +extern void	 __mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
>  extern void	 mpt_put_msg_frame(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
>  extern void	 mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
>  extern void	 mpt_add_sge(char *pAddr, u32 flagslength, dma_addr_t dma_addr);
> @@ -924,6 +924,14 @@ extern int	 mptbase_sas_persist_operation(MPT_ADAPTER *ioc, u8 persist_opcode);
>  extern int	 mpt_raid_phys_disk_pg0(MPT_ADAPTER *ioc, u8 phys_disk_num, pRaidPhysDiskPage0_t phys_disk);
>  extern void     mpt_halt_firmware(MPT_ADAPTER *ioc);
>  
> +static inline void
> +mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> +{
> +	unsigned long flags;
> +	spin_lock_irqsave(&ioc->FreeQlock, flags);
> +	__mpt_free_msg_frame(ioc, mf);
> +	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> +}
>  
>  /*
>   *  Public data decl's...
> diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c
> index e62c6bc..afe7fc0 100644
> --- a/drivers/message/fusion/mptscsih.c
> +++ b/drivers/message/fusion/mptscsih.c
> @@ -1719,6 +1719,18 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 type, u8 channel, u8 id, int lun, i
>  	}
>  
>  	if(mptscsih_tm_wait_for_completion(hd, timeout) == FAILED) {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(&ioc->FreeQlock, flags);
> +		if(hd->tmPending == 0) {
> +                        spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> +                        goto out_success;
> +                }
> +		__mpt_free_msg_frame(ioc, mf);
> +		hd->tmPending = 0;
> +		hd->tmState = TM_STATE_NONE;
> +		spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> +
>  		dfailprintk(ioc, printk(MYIOC_s_ERR_FMT "task management request TIMED OUT!"
>  			" (hd %p, ioc %p, mf %p) \n", ioc->name, hd,
>  			ioc, mf));
> @@ -1727,9 +1739,10 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 type, u8 channel, u8 id, int lun, i
>  		retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
>  		dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "rc=%d \n",
>  			 ioc->name, retval));
> -		goto fail_out;
> +		return FAILED;
>  	}
>  
> +out_success:
>  	/*
>  	 * Handle success case, see if theres a non-zero ioc_status.
>  	 */
> @@ -2159,6 +2172,7 @@ mptscsih_taskmgmt_complete(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf, MPT_FRAME_HDR *m
>  	u16			 iocstatus;
>  	u8			 tmType;
>  	u32			 termination_count;
> +	int			 retval = 1;
>  
>  	dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "TaskMgmt completed (mf=%p,mr=%p)\n",
>  	    ioc->name, mf, mr));
> @@ -2235,12 +2249,15 @@ mptscsih_taskmgmt_complete(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf, MPT_FRAME_HDR *m
>  
>   out:
>  	spin_lock_irqsave(&ioc->FreeQlock, flags);
> -	hd->tmPending = 0;
> -	hd->tmState = TM_STATE_NONE;
> +	if (hd->tmPending) {
> +		hd->tmPending = 0;
> +		hd->tmState = TM_STATE_NONE;
> +	} else
> +		retval = 0;
>  	hd->tm_iocstatus = iocstatus;
>  	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
>  
> -	return 1;
> +	return retval;
>  }
>  
>  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] Fix double free of MPT request frames.
  2009-05-30  5:56   ` Desai, Kashyap
@ 2009-05-30 18:34     ` Alok Kataria
  2009-06-03  9:24       ` Desai, Kashyap
  0 siblings, 1 reply; 7+ messages in thread
From: Alok Kataria @ 2009-05-30 18:34 UTC (permalink / raw)
  To: Desai, Kashyap
  Cc: Moore, Eric, James.Bottomley, DL-MPT Fusion Linux, linux-scsi,
	Prakash, Sathya


On Fri, 2009-05-29 at 22:56 -0700, Desai, Kashyap wrote:
> Alok,
> 
> This part of code is obsolete in LSI's new driver. 
> Very soon you will not see " mptscsih_tm_wait_for_completion" fuction.
> It is replaced by "wait_for_completion_timeout()".

Where is that code checked in ? I don't see it for 2.6.30..
Anyways we still need a fix for the previous kernels. 
> 
> I will say for your kernel panic due to double free in msg frame link
> list can be avoided by changing " mpt_free_msg_frame".

Yeah we thought about it.
But that still leaves another race, there is no guarantee that the frame
would not be reused by mpt_HardResetHandler. Since after reset it can
call SendEventNotification, which can allocate message frame, which may
then confuse your freeing logic. If that happens it would result in we
freeing some other used frame. Another thing that we thought about was
having this check in mpt_free_msg_frame along with moving the
free_msg_frame call before mpt_HardResetHandler, but that too leaves a
window open for the request frame to be reused due to the 250ms msleep.

IMHO the best way to fix this is avoid having these double free calls in
the first place rather than putting in these checks in the
mpt_free_msg_frame function.

Thanks,
Alok
>  See snippet of code below.
> ------------------------------------------------------------------
> void
> mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> {
> 	unsigned long flags;
> 
> 	/*  Put Request back on FreeQ!  */
> 	spin_lock_irqsave(&ioc->FreeQlock, flags);
> 	if (cpu_to_le32(mf->u.frame.linkage.arg1) == 0xdeadbeaf)  <----------
> 		goto out;
> 	/* signature to know if this mf is freed */
> 	mf->u.frame.linkage.arg1 = cpu_to_le32(0xdeadbeaf);
> 	list_add_tail(&mf->u.frame.linkage.list, &ioc->FreeQ);
> out:
> 	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> }
> ----------------------------------------------------------------
> 
> 
> This is how we are doing in LSI's Inbox driver. 
> 
> 
> Thanks,
> Kashyap
> 
> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Alok Kataria
> Sent: Thursday, May 28, 2009 4:25 AM
> To: Moore, Eric; James.Bottomley@HansenPartnership.com
> Cc: DL-MPT Fusion Linux; linux-scsi@vger.kernel.org
> Subject: Re: [PATCH] Fix double free of MPT request frames.
> 
> Ccing James...any comments on the patch below ? 
> 
> Thanks,
> Alok
> 
> On Tue, 2009-05-26 at 12:25 -0700, Alok Kataria wrote:
> > Hi, 
> > 
> > While testing scsi path failover for disks using MPT drivers we hit a
> > kernel oops on RHEL 5.1-64bit, while analyzing the problem we noticed
> > that this is due to a race present in the mpt scsi code path. The same
> > race seems to be present in the latest git kernel code too.
> > 
> > Here is the description of the race,
> > If a command has run for too long - which can happen often in the
> > failover case - the SCSI stack decides to abort that command and invokes
> > a device's abort handler. 
> > The code flow is...
> > 
> > mptscsih_abort ==> mptscsih_TMHandle (ABORT_TASK) ==>
> > mptscsih_IssueTaskMgmt (submits the command ) ==>
> > mptscsih_tm_wait_for_completion()
> > 
> > the mptscsih_tm_wait_for_completion code looks like this
> > 
> > do {
> >                 spin_lock_irqsave(&ioc->FreeQlock, flags);
> >                 if(hd->tmPending == 0) {
> >                         status = SUCCESS;
> >                         spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> >                         break;
> >                 }
> >                 spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> >                 msleep(250);                        <<==================
> >         } while (--loop_count);
> > 
> > where its looping on hd->tmPending (SUCCESS) or until timeout expires
> > (FAILURE).
> > 
> > Every thing works fine in the success case, in the failure case we
> > return to mptscsih_IssueTaskMgmt, and call mpt_HardResetHandler, where
> > we throw all inflight commands (hard reset) and free request via
> > mpt_free_msg_frame. Which also seems fine.
> > 
> > Now notice in the loop above that in the last iteration we sleep for
> > 250ms after the last read to tmPending, the TM command could complete
> > between this check of tmPending and before the adapter is hard reset. 
> > In such a case the request is double freed once when the command
> > completes (interrupt gets delivered and mpt_reply releases request
> > frame), and once after the mpt_HardResetHandler. This results in
> > corruption of  the linked list of empty request frames and causes a oops
> > on next list access.
> > 
> > The patch below attempts to fix this race by freeing the request_frame
> > before mpt_HardReset and doing that atomically after checking for
> > tmPending value. Also the mptscsih_taskmgmt_complete function returns
> > "1" irrespective of the tmPending value which results in always freeing
> > of the request frame from mpt_reply, so I have modified that function to
> > look at the tmPending value before returning. 
> > 
> > While I am at it, I have also added a BUG_ON statement in
> > mpt_free_msg_frame to check for double free, instead of silently
> > corrupting the list.
> > 
> > Please consider the patch below for inclusion.
> > 
> > 
> > Signed-off-by: Alok N Kataria <akataria@vmware.com>
> > Cc: <stable@kernel.org>
> > ---
> > 
> >  drivers/message/fusion/mptbase.c  |   11 ++++-------
> >  drivers/message/fusion/mptbase.h  |   10 +++++++++-
> >  drivers/message/fusion/mptscsih.c |   25 +++++++++++++++++++++----
> >  3 files changed, 34 insertions(+), 12 deletions(-)
> > 
> > 
> > diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
> > index 5d496a9..a1637bf 100644
> > --- a/drivers/message/fusion/mptbase.c
> > +++ b/drivers/message/fusion/mptbase.c
> > @@ -974,7 +974,7 @@ mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> >  
> >  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
> >  /**
> > - *	mpt_free_msg_frame - Place MPT request frame back on FreeQ.
> > + *	__mpt_free_msg_frame - Place MPT request frame back on FreeQ.
> >   *	@ioc: Pointer to MPT adapter structure
> >   *	@mf: Pointer to MPT request frame
> >   *
> > @@ -982,18 +982,15 @@ mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> >   *	FreeQ.
> >   */
> >  void
> > -mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> > +__mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> >  {
> > -	unsigned long flags;
> > -
> > +	BUG_ON(mf->u.frame.linkage.arg1 == 0xdeadbeaf);
> >  	/*  Put Request back on FreeQ!  */
> > -	spin_lock_irqsave(&ioc->FreeQlock, flags);
> >  	mf->u.frame.linkage.arg1 = 0xdeadbeaf; /* signature to know if this mf is freed */
> >  	list_add_tail(&mf->u.frame.linkage.list, &ioc->FreeQ);
> >  #ifdef MFCNT
> >  	ioc->mfcnt--;
> >  #endif
> > -	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> >  }
> >  
> >  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
> > @@ -7612,7 +7609,7 @@ EXPORT_SYMBOL(mpt_device_driver_deregister);
> >  EXPORT_SYMBOL(mpt_get_msg_frame);
> >  EXPORT_SYMBOL(mpt_put_msg_frame);
> >  EXPORT_SYMBOL(mpt_put_msg_frame_hi_pri);
> > -EXPORT_SYMBOL(mpt_free_msg_frame);
> > +EXPORT_SYMBOL(__mpt_free_msg_frame);
> >  EXPORT_SYMBOL(mpt_add_sge);
> >  EXPORT_SYMBOL(mpt_send_handshake_request);
> >  EXPORT_SYMBOL(mpt_verify_adapter);
> > diff --git a/drivers/message/fusion/mptbase.h b/drivers/message/fusion/mptbase.h
> > index b3e981d..4e4ee53 100644
> > --- a/drivers/message/fusion/mptbase.h
> > +++ b/drivers/message/fusion/mptbase.h
> > @@ -906,7 +906,7 @@ extern void	 mpt_reset_deregister(u8 cb_idx);
> >  extern int	 mpt_device_driver_register(struct mpt_pci_driver * dd_cbfunc, u8 cb_idx);
> >  extern void	 mpt_device_driver_deregister(u8 cb_idx);
> >  extern MPT_FRAME_HDR	*mpt_get_msg_frame(u8 cb_idx, MPT_ADAPTER *ioc);
> > -extern void	 mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
> > +extern void	 __mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
> >  extern void	 mpt_put_msg_frame(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
> >  extern void	 mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
> >  extern void	 mpt_add_sge(char *pAddr, u32 flagslength, dma_addr_t dma_addr);
> > @@ -924,6 +924,14 @@ extern int	 mptbase_sas_persist_operation(MPT_ADAPTER *ioc, u8 persist_opcode);
> >  extern int	 mpt_raid_phys_disk_pg0(MPT_ADAPTER *ioc, u8 phys_disk_num, pRaidPhysDiskPage0_t phys_disk);
> >  extern void     mpt_halt_firmware(MPT_ADAPTER *ioc);
> >  
> > +static inline void
> > +mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> > +{
> > +	unsigned long flags;
> > +	spin_lock_irqsave(&ioc->FreeQlock, flags);
> > +	__mpt_free_msg_frame(ioc, mf);
> > +	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> > +}
> >  
> >  /*
> >   *  Public data decl's...
> > diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c
> > index e62c6bc..afe7fc0 100644
> > --- a/drivers/message/fusion/mptscsih.c
> > +++ b/drivers/message/fusion/mptscsih.c
> > @@ -1719,6 +1719,18 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 type, u8 channel, u8 id, int lun, i
> >  	}
> >  
> >  	if(mptscsih_tm_wait_for_completion(hd, timeout) == FAILED) {
> > +		unsigned long flags;
> > +
> > +		spin_lock_irqsave(&ioc->FreeQlock, flags);
> > +		if(hd->tmPending == 0) {
> > +                        spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> > +                        goto out_success;
> > +                }
> > +		__mpt_free_msg_frame(ioc, mf);
> > +		hd->tmPending = 0;
> > +		hd->tmState = TM_STATE_NONE;
> > +		spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> > +
> >  		dfailprintk(ioc, printk(MYIOC_s_ERR_FMT "task management request TIMED OUT!"
> >  			" (hd %p, ioc %p, mf %p) \n", ioc->name, hd,
> >  			ioc, mf));
> > @@ -1727,9 +1739,10 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 type, u8 channel, u8 id, int lun, i
> >  		retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
> >  		dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "rc=%d \n",
> >  			 ioc->name, retval));
> > -		goto fail_out;
> > +		return FAILED;
> >  	}
> >  
> > +out_success:
> >  	/*
> >  	 * Handle success case, see if theres a non-zero ioc_status.
> >  	 */
> > @@ -2159,6 +2172,7 @@ mptscsih_taskmgmt_complete(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf, MPT_FRAME_HDR *m
> >  	u16			 iocstatus;
> >  	u8			 tmType;
> >  	u32			 termination_count;
> > +	int			 retval = 1;
> >  
> >  	dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "TaskMgmt completed (mf=%p,mr=%p)\n",
> >  	    ioc->name, mf, mr));
> > @@ -2235,12 +2249,15 @@ mptscsih_taskmgmt_complete(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf, MPT_FRAME_HDR *m
> >  
> >   out:
> >  	spin_lock_irqsave(&ioc->FreeQlock, flags);
> > -	hd->tmPending = 0;
> > -	hd->tmState = TM_STATE_NONE;
> > +	if (hd->tmPending) {
> > +		hd->tmPending = 0;
> > +		hd->tmState = TM_STATE_NONE;
> > +	} else
> > +		retval = 0;
> >  	hd->tm_iocstatus = iocstatus;
> >  	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> >  
> > -	return 1;
> > +	return retval;
> >  }
> >  
> >  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] Fix double free of MPT request frames.
  2009-05-30 18:34     ` Alok Kataria
@ 2009-06-03  9:24       ` Desai, Kashyap
  0 siblings, 0 replies; 7+ messages in thread
From: Desai, Kashyap @ 2009-06-03  9:24 UTC (permalink / raw)
  To: akataria
  Cc: Moore, Eric, James.Bottomley, DL-MPT Fusion Linux, linux-scsi,
	Prakash, Sathya

Alok,

Answers inline.

Thanks,
Kashyap


-----Original Message-----
From: Alok Kataria [mailto:akataria@vmware.com] 
Sent: Sunday, May 31, 2009 12:05 AM
To: Desai, Kashyap
Cc: Moore, Eric; James.Bottomley@HansenPartnership.com; DL-MPT Fusion Linux; linux-scsi@vger.kernel.org; Prakash, Sathya
Subject: RE: [PATCH] Fix double free of MPT request frames.


On Fri, 2009-05-29 at 22:56 -0700, Desai, Kashyap wrote:
> Alok,
> 
> This part of code is obsolete in LSI's new driver. 
> Very soon you will not see " mptscsih_tm_wait_for_completion" fuction.
> It is replaced by "wait_for_completion_timeout()".

Where is that code checked in ? I don't see it for 2.6.30..
Anyways we still need a fix for the previous kernels. 
---> This code is submitted to upstream. Not yet part of any release.
	Check this patch set: 

	http://marc.info/?l=linux-scsi&m=124359642904576&w=2

> 
> I will say for your kernel panic due to double free in msg frame link
> list can be avoided by changing " mpt_free_msg_frame".

Yeah we thought about it.
But that still leaves another race, there is no guarantee that the frame
would not be reused by mpt_HardResetHandler. Since after reset it can
call SendEventNotification, which can allocate message frame, which may
then confuse your freeing logic. If that happens it would result in we
freeing some other used frame. Another thing that we thought about was
having this check in mpt_free_msg_frame along with moving the
free_msg_frame call before mpt_HardResetHandler, but that too leaves a
window open for the request frame to be reused due to the 250ms msleep.

IMHO the best way to fix this is avoid having these double free calls in
the first place rather than putting in these checks in the
mpt_free_msg_frame function.

---> I agree with this concern. There is some other problem exist in your patch which I thought because you are trying to avoid above condition.

Consider below two points as potential bug in your patch.

#1. As per MPT Fusion design, driver should not free message frame which is still with FW. We should make sure that FW will not use submitted message frame in future before freeing it. There are multiple way to make sure FW will not use msg frame. 
	a. Successful return in ISR for msg frame.
	b. HardReset
	c. Task Abort (success case)

Potential bug in your patch is you are freeing msg frame before HardReset.
There is still possibility that FW will use that freed message frame. It can fault IOC or may be IOC hangs.

#2. Making TmState to NONE much before completing cleanup work for previously faild TM.  In general, TmState = TM_STATE_NONE should be just before you return from mptscsih_tm_wait_for_completion() failure case.






Thanks,
Alok
>  See snippet of code below.
> ------------------------------------------------------------------
> void
> mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> {
> 	unsigned long flags;
> 
> 	/*  Put Request back on FreeQ!  */
> 	spin_lock_irqsave(&ioc->FreeQlock, flags);
> 	if (cpu_to_le32(mf->u.frame.linkage.arg1) == 0xdeadbeaf)  <----------
> 		goto out;
> 	/* signature to know if this mf is freed */
> 	mf->u.frame.linkage.arg1 = cpu_to_le32(0xdeadbeaf);
> 	list_add_tail(&mf->u.frame.linkage.list, &ioc->FreeQ);
> out:
> 	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> }
> ----------------------------------------------------------------
> 
> 
> This is how we are doing in LSI's Inbox driver. 
> 
> 
> Thanks,
> Kashyap
> 
> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Alok Kataria
> Sent: Thursday, May 28, 2009 4:25 AM
> To: Moore, Eric; James.Bottomley@HansenPartnership.com
> Cc: DL-MPT Fusion Linux; linux-scsi@vger.kernel.org
> Subject: Re: [PATCH] Fix double free of MPT request frames.
> 
> Ccing James...any comments on the patch below ? 
> 
> Thanks,
> Alok
> 
> On Tue, 2009-05-26 at 12:25 -0700, Alok Kataria wrote:
> > Hi, 
> > 
> > While testing scsi path failover for disks using MPT drivers we hit a
> > kernel oops on RHEL 5.1-64bit, while analyzing the problem we noticed
> > that this is due to a race present in the mpt scsi code path. The same
> > race seems to be present in the latest git kernel code too.
> > 
> > Here is the description of the race,
> > If a command has run for too long - which can happen often in the
> > failover case - the SCSI stack decides to abort that command and invokes
> > a device's abort handler. 
> > The code flow is...
> > 
> > mptscsih_abort ==> mptscsih_TMHandle (ABORT_TASK) ==>
> > mptscsih_IssueTaskMgmt (submits the command ) ==>
> > mptscsih_tm_wait_for_completion()
> > 
> > the mptscsih_tm_wait_for_completion code looks like this
> > 
> > do {
> >                 spin_lock_irqsave(&ioc->FreeQlock, flags);
> >                 if(hd->tmPending == 0) {
> >                         status = SUCCESS;
> >                         spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> >                         break;
> >                 }
> >                 spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> >                 msleep(250);                        <<==================
> >         } while (--loop_count);
> > 
> > where its looping on hd->tmPending (SUCCESS) or until timeout expires
> > (FAILURE).
> > 
> > Every thing works fine in the success case, in the failure case we
> > return to mptscsih_IssueTaskMgmt, and call mpt_HardResetHandler, where
> > we throw all inflight commands (hard reset) and free request via
> > mpt_free_msg_frame. Which also seems fine.
> > 
> > Now notice in the loop above that in the last iteration we sleep for
> > 250ms after the last read to tmPending, the TM command could complete
> > between this check of tmPending and before the adapter is hard reset. 
> > In such a case the request is double freed once when the command
> > completes (interrupt gets delivered and mpt_reply releases request
> > frame), and once after the mpt_HardResetHandler. This results in
> > corruption of  the linked list of empty request frames and causes a oops
> > on next list access.
> > 
> > The patch below attempts to fix this race by freeing the request_frame
> > before mpt_HardReset and doing that atomically after checking for
> > tmPending value. Also the mptscsih_taskmgmt_complete function returns
> > "1" irrespective of the tmPending value which results in always freeing
> > of the request frame from mpt_reply, so I have modified that function to
> > look at the tmPending value before returning. 
> > 
> > While I am at it, I have also added a BUG_ON statement in
> > mpt_free_msg_frame to check for double free, instead of silently
> > corrupting the list.
> > 
> > Please consider the patch below for inclusion.
> > 
> > 
> > Signed-off-by: Alok N Kataria <akataria@vmware.com>
> > Cc: <stable@kernel.org>
> > ---
> > 
> >  drivers/message/fusion/mptbase.c  |   11 ++++-------
> >  drivers/message/fusion/mptbase.h  |   10 +++++++++-
> >  drivers/message/fusion/mptscsih.c |   25 +++++++++++++++++++++----
> >  3 files changed, 34 insertions(+), 12 deletions(-)
> > 
> > 
> > diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
> > index 5d496a9..a1637bf 100644
> > --- a/drivers/message/fusion/mptbase.c
> > +++ b/drivers/message/fusion/mptbase.c
> > @@ -974,7 +974,7 @@ mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> >  
> >  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
> >  /**
> > - *	mpt_free_msg_frame - Place MPT request frame back on FreeQ.
> > + *	__mpt_free_msg_frame - Place MPT request frame back on FreeQ.
> >   *	@ioc: Pointer to MPT adapter structure
> >   *	@mf: Pointer to MPT request frame
> >   *
> > @@ -982,18 +982,15 @@ mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> >   *	FreeQ.
> >   */
> >  void
> > -mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> > +__mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> >  {
> > -	unsigned long flags;
> > -
> > +	BUG_ON(mf->u.frame.linkage.arg1 == 0xdeadbeaf);
> >  	/*  Put Request back on FreeQ!  */
> > -	spin_lock_irqsave(&ioc->FreeQlock, flags);
> >  	mf->u.frame.linkage.arg1 = 0xdeadbeaf; /* signature to know if this mf is freed */
> >  	list_add_tail(&mf->u.frame.linkage.list, &ioc->FreeQ);
> >  #ifdef MFCNT
> >  	ioc->mfcnt--;
> >  #endif
> > -	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> >  }
> >  
> >  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
> > @@ -7612,7 +7609,7 @@ EXPORT_SYMBOL(mpt_device_driver_deregister);
> >  EXPORT_SYMBOL(mpt_get_msg_frame);
> >  EXPORT_SYMBOL(mpt_put_msg_frame);
> >  EXPORT_SYMBOL(mpt_put_msg_frame_hi_pri);
> > -EXPORT_SYMBOL(mpt_free_msg_frame);
> > +EXPORT_SYMBOL(__mpt_free_msg_frame);
> >  EXPORT_SYMBOL(mpt_add_sge);
> >  EXPORT_SYMBOL(mpt_send_handshake_request);
> >  EXPORT_SYMBOL(mpt_verify_adapter);
> > diff --git a/drivers/message/fusion/mptbase.h b/drivers/message/fusion/mptbase.h
> > index b3e981d..4e4ee53 100644
> > --- a/drivers/message/fusion/mptbase.h
> > +++ b/drivers/message/fusion/mptbase.h
> > @@ -906,7 +906,7 @@ extern void	 mpt_reset_deregister(u8 cb_idx);
> >  extern int	 mpt_device_driver_register(struct mpt_pci_driver * dd_cbfunc, u8 cb_idx);
> >  extern void	 mpt_device_driver_deregister(u8 cb_idx);
> >  extern MPT_FRAME_HDR	*mpt_get_msg_frame(u8 cb_idx, MPT_ADAPTER *ioc);
> > -extern void	 mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
> > +extern void	 __mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
> >  extern void	 mpt_put_msg_frame(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
> >  extern void	 mpt_put_msg_frame_hi_pri(u8 cb_idx, MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf);
> >  extern void	 mpt_add_sge(char *pAddr, u32 flagslength, dma_addr_t dma_addr);
> > @@ -924,6 +924,14 @@ extern int	 mptbase_sas_persist_operation(MPT_ADAPTER *ioc, u8 persist_opcode);
> >  extern int	 mpt_raid_phys_disk_pg0(MPT_ADAPTER *ioc, u8 phys_disk_num, pRaidPhysDiskPage0_t phys_disk);
> >  extern void     mpt_halt_firmware(MPT_ADAPTER *ioc);
> >  
> > +static inline void
> > +mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf)
> > +{
> > +	unsigned long flags;
> > +	spin_lock_irqsave(&ioc->FreeQlock, flags);
> > +	__mpt_free_msg_frame(ioc, mf);
> > +	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> > +}
> >  
> >  /*
> >   *  Public data decl's...
> > diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c
> > index e62c6bc..afe7fc0 100644
> > --- a/drivers/message/fusion/mptscsih.c
> > +++ b/drivers/message/fusion/mptscsih.c
> > @@ -1719,6 +1719,18 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 type, u8 channel, u8 id, int lun, i
> >  	}
> >  
> >  	if(mptscsih_tm_wait_for_completion(hd, timeout) == FAILED) {
> > +		unsigned long flags;
> > +
> > +		spin_lock_irqsave(&ioc->FreeQlock, flags);
> > +		if(hd->tmPending == 0) {
> > +                        spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> > +                        goto out_success;
> > +                }
> > +		__mpt_free_msg_frame(ioc, mf);
> > +		hd->tmPending = 0;
> > +		hd->tmState = TM_STATE_NONE;
> > +		spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> > +
> >  		dfailprintk(ioc, printk(MYIOC_s_ERR_FMT "task management request TIMED OUT!"
> >  			" (hd %p, ioc %p, mf %p) \n", ioc->name, hd,
> >  			ioc, mf));
> > @@ -1727,9 +1739,10 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 type, u8 channel, u8 id, int lun, i
> >  		retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
> >  		dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "rc=%d \n",
> >  			 ioc->name, retval));
> > -		goto fail_out;
> > +		return FAILED;
> >  	}
> >  
> > +out_success:
> >  	/*
> >  	 * Handle success case, see if theres a non-zero ioc_status.
> >  	 */
> > @@ -2159,6 +2172,7 @@ mptscsih_taskmgmt_complete(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf, MPT_FRAME_HDR *m
> >  	u16			 iocstatus;
> >  	u8			 tmType;
> >  	u32			 termination_count;
> > +	int			 retval = 1;
> >  
> >  	dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "TaskMgmt completed (mf=%p,mr=%p)\n",
> >  	    ioc->name, mf, mr));
> > @@ -2235,12 +2249,15 @@ mptscsih_taskmgmt_complete(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf, MPT_FRAME_HDR *m
> >  
> >   out:
> >  	spin_lock_irqsave(&ioc->FreeQlock, flags);
> > -	hd->tmPending = 0;
> > -	hd->tmState = TM_STATE_NONE;
> > +	if (hd->tmPending) {
> > +		hd->tmPending = 0;
> > +		hd->tmState = TM_STATE_NONE;
> > +	} else
> > +		retval = 0;
> >  	hd->tm_iocstatus = iocstatus;
> >  	spin_unlock_irqrestore(&ioc->FreeQlock, flags);
> >  
> > -	return 1;
> > +	return retval;
> >  }
> >  
> >  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] Fix double free of MPT request frames.
  2009-12-10 22:36 Darius S. Naqvi
@ 2009-12-15  4:51 ` Desai, Kashyap
  0 siblings, 0 replies; 7+ messages in thread
From: Desai, Kashyap @ 2009-12-15  4:51 UTC (permalink / raw)
  To: Darius S. Naqvi, linux-scsi

Naqvi,

If you are seeing problem in upgrading whole kernel, I would rather suggest move mptfusion driver to latest and keep you base kernel 2.6.24-24-generic.

For this you may need to modify mpt fusion little bit to make driver compliable. 

Double free issue was present in older mptfusion driver and as you have observed latest mptfusion driver has changed a lot from there.
Now, with latest mptfusion driver this double free MPT frame issue is not present.

- Kashyap

> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
> owner@vger.kernel.org] On Behalf Of Darius S. Naqvi
> Sent: Friday, December 11, 2009 4:06 AM
> To: linux-scsi@vger.kernel.org
> Subject: RE: [PATCH] Fix double free of MPT request frames.
> 
> Hi,
> 
> On May 26, 2009 at 12:25 pm, Alok Kataria wrote:
> 
> > While testing scsi path failover for disks using MPT drivers we hit a
> > kernel oops on RHEL 5.1-64bit, while analyzing the problem we noticed
> > that this is due to a race present in the mpt scsi code path. The same
> > race seems to be present in the latest git kernel code too.
> >
> 
> I'm sorry to be bringing up something from months ago, but I do have
> some questions about this.  I believe I might be seeing the same
> problem.  We're running Ubuntu 8.04.3 LTS (linux kernel
> 2.6.24-24-generic), and the code is quite similar to your patch, to
> the point that the patch applies smoothly except for a couple of
> hunks in comments.  So I have a couple of questions:
> 
> 1) What was the kernel oops like?  In particular, was it like the one
>     I saw?  (I'll add details of the kernel oops later in this email).
> 
> 2) Has this patch or anything like it been accepted into the kernel?
>     I have my doubts, as Kashyap Desai had this to say about it on May
>     29, 2009 at 10:55pm (text re-formatted to break up long lines):
> 
> > Consider below two points as potential bug in your patch.
> >
> > #1. As per MPT Fusion design, driver should not free message frame
> >  which is still with FW. We should make sure that FW will not use
> >  submitted message frame in future before freeing it. There are
> >  multiple way to make sure FW will not use msg frame.
> >  a. Successful return in ISR for msg frame.
> >  b. HardReset
> >  c. Task Abort (success case)
> >
> > Potential bug in your patch is you are freeing msg frame before
> HardReset.
> > There is still possibility that FW will use that freed message
> > frame. It can fault IOC or may be IOC hangs.
> >
> > #2. Making TmState to NONE much before completing cleanup work for
> >  previously faild TM.  In general, TmState = TM_STATE_NONE should be
> >  just before you return from mptscsih_tm_wait_for_completion()
> >  failure case.
> >
> 
> 3) Is this fixed in the latest kernel (currently 2.6.32)?  The code
>     has changed quite a bit, so it isn't obvious that the problem still
>     exists, nor is it obvious that it has been fixed.
> 
> Unfortunately, it isn't easy for me to upgrade to the latest kernel,
> as we want to settle on a long-term supportable Unbuntu release for
> our deployments, so we use 8.04.3 LTS, for which the newest kernel so
> far is 2.6.24-26-generic.
> 
> Here are the details of my kernel oops, as I promised in #1 above:
> 
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933351] mptscsih: ioc0: Issue
> of TaskMgmt failed!
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933361] mptscsih: ioc0: task
> abort:FAILED (sc=f6b44a00)
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933373] mptscsih: ioc0:
> attempting target reset! (sc=f6b44a00)
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933377] sd 2:0:1:0: [sdb] CDB:
> Read(10): 28 00 01 20 aa c4 00 00 04 00
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933431]  target2:0:0:
> Beginning Domain Validation
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.044445] mptscsih: ioc0: target
> reset: SUCCESS (sc=f6b44a00)
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.044804]  target2:0:0: Domain
> Validation Initial Inquiry Failed
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.044874]  target2:0:0: Ending
> DomainValidation
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045001] BUG: unable to handle
> kernel NULL pointer dereference at virtual address 00000005
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045128] printing eip: f89235fa
> *pde= 00000000
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045299] Oops: 0002 [#1] SMP
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045413] Modules linked in:
> iscsi_trgt crc32c libcrc32c vmmemctl cpufreq_powersave cpufreq_userspace
> cpufreq_ondemand cpufreq_conservative cpufreq_stats freq_table video
> output battery sbs sbshc dock iptable_filter ip_tables x_tables iscsi_tcp
> libiscsi scsi_transport_iscsi lploop ipv6 container evdev serio_raw ac
> button intel_agp parport_pc parport i2c_piix4 agpgart shpchp pci_hotplug
> i2c_core psmouse pcspkr ext3 jbd mbcache sd_modsg sr_mod cdrom ata_generic
> floppy pcnet32 mii mptspi mptscsih mptbase scsi_transport_spi ata_piix
> pata_acpi libata scsi_mod raid10 raid456 async_xor async_memcpy async_tx
> xor raid1 raid0 multipath linear md_mod dm_mirror dm_snapshot dm_mod
> thermal processor fan fbcon tileblit font bitblit softcursor fuse vmxnet
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046097]
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046177] Pid: 6, comm: events/0
> Not tainted (2.6.24-24-generic #1)
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046237] EIP: 0060:[<f89235fa>]
> EFLAGS: 00010097 CPU: 0
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046480] EIP is at
> mpt_get_msg_frame+0x6a/0x100 [mptbase]
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046541] EAX: 0000400a EBX:
> f7fc5000ECX: df8c49c0 EDX: 00000001
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046599] ESI: df8c49c0 EDI:
> 0000000fEBP: f7fc5104 ESP: f7c2bde8
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046656]  DS: 007b ES: 007b FS:
> 00d8GS: 0000 SS: 0068
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046723] Process events/0 (pid:
> 6, ti=f7c2a000 task=f7c28000 task.ti=f7c2a000)
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046788] Stack: 00000000
> c01731e0 00000044 f7c2be6c 0000000a 0f000082 00000202 f7fc5000
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046922]        f7ea6c14
> f7c2bec4 f7fc5000 f8926c5d 00000001 c03ef590 000000d0 0000000c
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047015]        00000000
> 0000000c f7c8444c 00000000 c0109173 00000000 f7c2bf24 f7c8444c
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047111] Call Trace:
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047269]  [<c01731e0>]
> __alloc_pages+0x60/0x3a0
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047453]  [<f8926c5d>]
> mpt_config+0x2d/0x2f0 [mptbase]
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047510]  [<c0109173>]
> dma_alloc_coherent+0xc3/0x110
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047567]  [<f88df86f>]
> mptspi_read_parameters+0x12f/0x3f0 [mptspi]
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047636]  [<f88e016f>]
> mptspi_dv_device+0x6f/0x170 [mptspi]
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047688]  [<c028063e>]
> get_device+0xe/0x20
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047744]  [<f893a13e>]
> scsi_device_get+0x1e/0x50 [scsi_mod]
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047814]  [<f893a298>]
> __scsi_iterate_devices+0x48/0x70 [scsi_mod]
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047879]  [<f88e0319>]
> mptspi_dv_renegotiate_work+0xa9/0xd0 [mptspi]
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047935]  [<c0179750>]
> vmstat_update+0x0/0x30
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047988]  [<f88e0270>]
> mptspi_dv_renegotiate_work+0x0/0xd0 [mptspi]
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048043]  [<c013cecf>]
> run_workqueue+0xbf/0x160
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048094]  [<c013d970>]
> worker_thread+0x0/0xe0
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048141]  [<c013d9f4>]
> worker_thread+0x84/0xe0
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048188]  [<c0140c80>]
> autoremove_wake_function+0x0/0x40
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048240]  [<c013d970>]
> worker_thread+0x0/0xe0
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048287]  [<c01409c2>]
> kthread+0x42/0x70
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048331]  [<c0140980>]
> kthread+0x0/0x70
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048375]  [<c0105667>]
> kernel_thread_helper+0x7/0x10
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048479]
> =======================
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048536] Code: 00 8d ab 04 01
> 00 00 89 e8 e8 d3 9e 9f c7 8b 93 08 01 00 00 89 44 24 18 8d 83 08 01 00 00
> 39 c2 74 5489 d6 8b 12 8b 46 04 89 f1 <89> 42 04 89 10 89 f8 c7 46 08 00
> 00 00 00 c7 06 0001 10 00 88
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048824] EIP: [<f89235fa>]
> mpt_get_msg_frame+0x6a/0x100 [mptbase] SS:ESP 0068:f7c2bde8
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.049231] ---[ end trace
> 4cf91050ea603706 ]---
> 
> I used objdump and matched up its output to the oops, and I concluded the
> oops occured at 5fa below:
> 
> /home/darius/ubuntu_src/linux-2.6.24/include/linux/list.h:157
>   * This is only for internal list manipulation where we know
>   * the prev/next entries already!
>   */
> static inline void __list_del(struct list_head * prev, struct list_head *
> next)
> {
>          next->prev = prev;
>       5fa:       89 42 04                mov    %eax,0x4(%edx)
> /home/darius/ubuntu_src/linux-2.6.24/include/linux/list.h:158
>          prev->next = next;
>       5fd:       89 10                   mov    %edx,(%eax)
> 
> The oops reports that EDX is 0x1, which means 5fa is trying to write
> to the address 0x5, which fits with this line in the oops:
> 
> Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045001] BUG: unable to handle
> kernel NULL pointer dereference at virtual address 00000005
> 
> In short, a next pointer in the message frame has a ridiculous value,
> which could be caused by using a message frame that has already been
> freed.
> 
> --
> Darius S. Naqvi
> dnaqvi@datagardens.com
> http://www.datagardens.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] Fix double free of MPT request frames.
@ 2009-12-10 22:36 Darius S. Naqvi
  2009-12-15  4:51 ` Desai, Kashyap
  0 siblings, 1 reply; 7+ messages in thread
From: Darius S. Naqvi @ 2009-12-10 22:36 UTC (permalink / raw)
  To: linux-scsi

Hi,

On May 26, 2009 at 12:25 pm, Alok Kataria wrote:

> While testing scsi path failover for disks using MPT drivers we hit a
> kernel oops on RHEL 5.1-64bit, while analyzing the problem we noticed
> that this is due to a race present in the mpt scsi code path. The same
> race seems to be present in the latest git kernel code too.
>

I'm sorry to be bringing up something from months ago, but I do have
some questions about this.  I believe I might be seeing the same
problem.  We're running Ubuntu 8.04.3 LTS (linux kernel
2.6.24-24-generic), and the code is quite similar to your patch, to
the point that the patch applies smoothly except for a couple of
hunks in comments.  So I have a couple of questions:

1) What was the kernel oops like?  In particular, was it like the one
    I saw?  (I'll add details of the kernel oops later in this email).

2) Has this patch or anything like it been accepted into the kernel?
    I have my doubts, as Kashyap Desai had this to say about it on May
    29, 2009 at 10:55pm (text re-formatted to break up long lines):

> Consider below two points as potential bug in your patch.
> 
> #1. As per MPT Fusion design, driver should not free message frame
>  which is still with FW. We should make sure that FW will not use
>  submitted message frame in future before freeing it. There are
>  multiple way to make sure FW will not use msg frame.
>  a. Successful return in ISR for msg frame.
>  b. HardReset
>  c. Task Abort (success case)
> 
> Potential bug in your patch is you are freeing msg frame before HardReset.
> There is still possibility that FW will use that freed message
> frame. It can fault IOC or may be IOC hangs.
> 
> #2. Making TmState to NONE much before completing cleanup work for
>  previously faild TM.  In general, TmState = TM_STATE_NONE should be
>  just before you return from mptscsih_tm_wait_for_completion()
>  failure case.
>

3) Is this fixed in the latest kernel (currently 2.6.32)?  The code
    has changed quite a bit, so it isn't obvious that the problem still
    exists, nor is it obvious that it has been fixed.

Unfortunately, it isn't easy for me to upgrade to the latest kernel,
as we want to settle on a long-term supportable Unbuntu release for
our deployments, so we use 8.04.3 LTS, for which the newest kernel so
far is 2.6.24-26-generic.

Here are the details of my kernel oops, as I promised in #1 above:

Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933351] mptscsih: ioc0: Issue of TaskMgmt failed!
Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933361] mptscsih: ioc0: task abort:FAILED (sc=f6b44a00)
Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933373] mptscsih: ioc0: attempting target reset! (sc=f6b44a00)
Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933377] sd 2:0:1:0: [sdb] CDB: Read(10): 28 00 01 20 aa c4 00 00 04 00
Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933431]  target2:0:0: Beginning Domain Validation
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.044445] mptscsih: ioc0: target reset: SUCCESS (sc=f6b44a00)
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.044804]  target2:0:0: Domain Validation Initial Inquiry Failed
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.044874]  target2:0:0: Ending DomainValidation
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045001] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000005
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045128] printing eip: f89235fa *pde= 00000000
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045299] Oops: 0002 [#1] SMP
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045413] Modules linked in: iscsi_trgt crc32c libcrc32c vmmemctl cpufreq_powersave cpufreq_userspace cpufreq_ondemand cpufreq_conservative cpufreq_stats freq_table video output battery sbs sbshc dock iptable_filter ip_tables x_tables iscsi_tcp libiscsi scsi_transport_iscsi lploop ipv6 container evdev serio_raw ac button intel_agp parport_pc parport i2c_piix4 agpgart shpchp pci_hotplug i2c_core psmouse pcspkr ext3 jbd mbcache sd_modsg sr_mod cdrom ata_generic floppy pcnet32 mii mptspi mptscsih mptbase scsi_transport_spi ata_piix pata_acpi libata scsi_mod raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse vmxnet
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046097]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046177] Pid: 6, comm: events/0 Not tainted (2.6.24-24-generic #1)
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046237] EIP: 0060:[<f89235fa>] EFLAGS: 00010097 CPU: 0
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046480] EIP is at mpt_get_msg_frame+0x6a/0x100 [mptbase]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046541] EAX: 0000400a EBX: f7fc5000ECX: df8c49c0 EDX: 00000001
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046599] ESI: df8c49c0 EDI: 0000000fEBP: f7fc5104 ESP: f7c2bde8
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046656]  DS: 007b ES: 007b FS: 00d8GS: 0000 SS: 0068
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046723] Process events/0 (pid: 6, ti=f7c2a000 task=f7c28000 task.ti=f7c2a000)
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046788] Stack: 00000000 c01731e0 00000044 f7c2be6c 0000000a 0f000082 00000202 f7fc5000
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046922]        f7ea6c14 f7c2bec4 f7fc5000 f8926c5d 00000001 c03ef590 000000d0 0000000c
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047015]        00000000 0000000c f7c8444c 00000000 c0109173 00000000 f7c2bf24 f7c8444c
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047111] Call Trace:
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047269]  [<c01731e0>] __alloc_pages+0x60/0x3a0
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047453]  [<f8926c5d>] mpt_config+0x2d/0x2f0 [mptbase]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047510]  [<c0109173>] dma_alloc_coherent+0xc3/0x110
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047567]  [<f88df86f>] mptspi_read_parameters+0x12f/0x3f0 [mptspi]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047636]  [<f88e016f>] mptspi_dv_device+0x6f/0x170 [mptspi]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047688]  [<c028063e>] get_device+0xe/0x20
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047744]  [<f893a13e>] scsi_device_get+0x1e/0x50 [scsi_mod]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047814]  [<f893a298>] __scsi_iterate_devices+0x48/0x70 [scsi_mod]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047879]  [<f88e0319>] mptspi_dv_renegotiate_work+0xa9/0xd0 [mptspi]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047935]  [<c0179750>] vmstat_update+0x0/0x30
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047988]  [<f88e0270>] mptspi_dv_renegotiate_work+0x0/0xd0 [mptspi]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048043]  [<c013cecf>] run_workqueue+0xbf/0x160
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048094]  [<c013d970>] worker_thread+0x0/0xe0
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048141]  [<c013d9f4>] worker_thread+0x84/0xe0
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048188]  [<c0140c80>] autoremove_wake_function+0x0/0x40
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048240]  [<c013d970>] worker_thread+0x0/0xe0
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048287]  [<c01409c2>] kthread+0x42/0x70
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048331]  [<c0140980>] kthread+0x0/0x70
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048375]  [<c0105667>] kernel_thread_helper+0x7/0x10
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048479]  =======================
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048536] Code: 00 8d ab 04 01 00 00 89 e8 e8 d3 9e 9f c7 8b 93 08 01 00 00 89 44 24 18 8d 83 08 01 00 00 39 c2 74 5489 d6 8b 12 8b 46 04 89 f1 <89> 42 04 89 10 89 f8 c7 46 08 00 00 00 00 c7 06 0001 10 00 88
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048824] EIP: [<f89235fa>] mpt_get_msg_frame+0x6a/0x100 [mptbase] SS:ESP 0068:f7c2bde8
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.049231] ---[ end trace 4cf91050ea603706 ]---

I used objdump and matched up its output to the oops, and I concluded the oops occured at 5fa below:

/home/darius/ubuntu_src/linux-2.6.24/include/linux/list.h:157
  * This is only for internal list manipulation where we know
  * the prev/next entries already!
  */
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
         next->prev = prev;
      5fa:       89 42 04                mov    %eax,0x4(%edx)
/home/darius/ubuntu_src/linux-2.6.24/include/linux/list.h:158
         prev->next = next;
      5fd:       89 10                   mov    %edx,(%eax)

The oops reports that EDX is 0x1, which means 5fa is trying to write
to the address 0x5, which fits with this line in the oops:

Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045001] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000005

In short, a next pointer in the message frame has a ridiculous value,
which could be caused by using a message frame that has already been
freed.

-- 
Darius S. Naqvi
dnaqvi@datagardens.com
http://www.datagardens.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-12-15  4:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-26 19:25 [PATCH] Fix double free of MPT request frames Alok Kataria
2009-05-27 22:54 ` Alok Kataria
2009-05-30  5:56   ` Desai, Kashyap
2009-05-30 18:34     ` Alok Kataria
2009-06-03  9:24       ` Desai, Kashyap
2009-12-10 22:36 Darius S. Naqvi
2009-12-15  4:51 ` Desai, Kashyap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.