All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition
@ 2023-03-18  8:13 Zheng Wang
  2023-03-20 16:11 ` Mike Christie
  0 siblings, 1 reply; 5+ messages in thread
From: Zheng Wang @ 2023-03-18  8:13 UTC (permalink / raw)
  To: njavali
  Cc: mrangankar, GR-QLogic-Storage-Upstream, jejb, martin.petersen,
	linux-scsi, linux-kernel, hackerzheng666, 1395428693sheep,
	alex000young, Zheng Wang

In qedi_probe, it calls __qedi_probe, which bound &qedi->recovery_work
with qedi_recovery_handler and bound &qedi->board_disable_work
with qedi_board_disable_work.

When it calls qedi_schedule_recovery_handler, it will finally
call schedule_delayed_work to start the work.

When we call qedi_remove to remove the driver, there
may be a sequence as follows:

Fix it by finishing the work before cleanup in qedi_remove.

CPU0                  CPU1

                     |qedi_recovery_handler
qedi_remove          |
  __qedi_remove      |
iscsi_host_free      |
scsi_host_put        |
//free shost         |
                     |iscsi_host_for_each_session
                     |//use qedi->shost

Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
---
 drivers/scsi/qedi/qedi_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index f2ee49756df8..25223f6f5344 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev *pdev, int mode)
 	int rval;
 	u16 retry = 10;
 
+	/*cancel work*/
+	cancel_delayed_work_sync(&qedi->recovery_work);
+	cancel_delayed_work_sync(&qedi->board_disable_work);
+
 	if (mode == QEDI_MODE_NORMAL)
 		iscsi_host_remove(qedi->shost, false);
 	else if (mode == QEDI_MODE_SHUTDOWN)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition
  2023-03-18  8:13 [PATCH RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition Zheng Wang
@ 2023-03-20 16:11 ` Mike Christie
  2023-03-23  3:44   ` Zheng Hacker
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Christie @ 2023-03-20 16:11 UTC (permalink / raw)
  To: Zheng Wang, njavali
  Cc: mrangankar, GR-QLogic-Storage-Upstream, jejb, martin.petersen,
	linux-scsi, linux-kernel, hackerzheng666, 1395428693sheep,
	alex000young

On 3/18/23 3:13 AM, Zheng Wang wrote:
> In qedi_probe, it calls __qedi_probe, which bound &qedi->recovery_work
> with qedi_recovery_handler and bound &qedi->board_disable_work
> with qedi_board_disable_work.
> 
> When it calls qedi_schedule_recovery_handler, it will finally
> call schedule_delayed_work to start the work.
> 
> When we call qedi_remove to remove the driver, there
> may be a sequence as follows:
> 
> Fix it by finishing the work before cleanup in qedi_remove.
> 
> CPU0                  CPU1
> 
>                      |qedi_recovery_handler
> qedi_remove          |
>   __qedi_remove      |
> iscsi_host_free      |
> scsi_host_put        |
> //free shost         |
>                      |iscsi_host_for_each_session
>                      |//use qedi->shost
> 
> Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
> Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> ---
>  drivers/scsi/qedi/qedi_main.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
> index f2ee49756df8..25223f6f5344 100644
> --- a/drivers/scsi/qedi/qedi_main.c
> +++ b/drivers/scsi/qedi/qedi_main.c
> @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev *pdev, int mode)
>  	int rval;
>  	u16 retry = 10;
>  
> +	/*cancel work*/

This comment is not needed. The name of the functions you are calling have
"cancel" and "work" in them so we know. If you want to add a comment explain
why the cancel calls are needed here.


> +	cancel_delayed_work_sync(&qedi->recovery_work);
> +	cancel_delayed_work_sync(&qedi->board_disable_work);


How do you know after you have called cancel_delayed_work_sync that
schedule_recovery_handler or schedule_hw_err_handler can't be called?
I don't know the qed driver well, but it looks like you could have
operations still running, so after you cancel here one of those ops
could lead to them scheduling the work again.


> +
>  	if (mode == QEDI_MODE_NORMAL)
>  		iscsi_host_remove(qedi->shost, false);
>  	else if (mode == QEDI_MODE_SHUTDOWN)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition
  2023-03-20 16:11 ` Mike Christie
@ 2023-03-23  3:44   ` Zheng Hacker
  2023-03-23 10:17     ` [EXT] " Manish Rangankar
  0 siblings, 1 reply; 5+ messages in thread
From: Zheng Hacker @ 2023-03-23  3:44 UTC (permalink / raw)
  To: Mike Christie
  Cc: Zheng Wang, njavali, mrangankar, GR-QLogic-Storage-Upstream,
	jejb, martin.petersen, linux-scsi, linux-kernel, 1395428693sheep,
	alex000young

Mike Christie <michael.christie@oracle.com> 于2023年3月21日周二 00:11写道:
>
> On 3/18/23 3:13 AM, Zheng Wang wrote:
> > In qedi_probe, it calls __qedi_probe, which bound &qedi->recovery_work
> > with qedi_recovery_handler and bound &qedi->board_disable_work
> > with qedi_board_disable_work.
> >
> > When it calls qedi_schedule_recovery_handler, it will finally
> > call schedule_delayed_work to start the work.
> >
> > When we call qedi_remove to remove the driver, there
> > may be a sequence as follows:
> >
> > Fix it by finishing the work before cleanup in qedi_remove.
> >
> > CPU0                  CPU1
> >
> >                      |qedi_recovery_handler
> > qedi_remove          |
> >   __qedi_remove      |
> > iscsi_host_free      |
> > scsi_host_put        |
> > //free shost         |
> >                      |iscsi_host_for_each_session
> >                      |//use qedi->shost
> >
> > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
> > Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> > ---
> >  drivers/scsi/qedi/qedi_main.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
> > index f2ee49756df8..25223f6f5344 100644
> > --- a/drivers/scsi/qedi/qedi_main.c
> > +++ b/drivers/scsi/qedi/qedi_main.c
> > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev *pdev, int mode)
> >       int rval;
> >       u16 retry = 10;
> >
> > +     /*cancel work*/
>
> This comment is not needed. The name of the functions you are calling have
> "cancel" and "work" in them so we know. If you want to add a comment explain
> why the cancel calls are needed here.
>

Hi,

Sorry for my late reply and thanks for your advice. Will remove it in
the next version of patch.

>
> > +     cancel_delayed_work_sync(&qedi->recovery_work);
> > +     cancel_delayed_work_sync(&qedi->board_disable_work);
>
>
> How do you know after you have called cancel_delayed_work_sync that
> schedule_recovery_handler or schedule_hw_err_handler can't be called?
> I don't know the qed driver well, but it looks like you could have
> operations still running, so after you cancel here one of those ops
> could lead to them scheduling the work again.
>

Sorry I didn't know how to make sure there's no more schedule. But I do think
this is important. Maybe there're someone else who can give us advice.

Best regards,
Zheng
>
> > +
> >       if (mode == QEDI_MODE_NORMAL)
> >               iscsi_host_remove(qedi->shost, false);
> >       else if (mode == QEDI_MODE_SHUTDOWN)
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition
  2023-03-23  3:44   ` Zheng Hacker
@ 2023-03-23 10:17     ` Manish Rangankar
  2023-03-24 15:21       ` Zheng Hacker
  0 siblings, 1 reply; 5+ messages in thread
From: Manish Rangankar @ 2023-03-23 10:17 UTC (permalink / raw)
  To: Zheng Hacker, Mike Christie
  Cc: Zheng Wang, Nilesh Javali, GR-QLogic-Storage-Upstream, jejb,
	martin.petersen, linux-scsi, linux-kernel, 1395428693sheep,
	alex000young



> -----Original Message-----
> From: Zheng Hacker <hackerzheng666@gmail.com>
> Sent: Thursday, March 23, 2023 9:15 AM
> To: Mike Christie <michael.christie@oracle.com>
> Cc: Zheng Wang <zyytlz.wz@163.com>; Nilesh Javali <njavali@marvell.com>;
> Manish Rangankar <mrangankar@marvell.com>; GR-QLogic-Storage-
> Upstream <GR-QLogic-Storage-Upstream@marvell.com>;
> jejb@linux.ibm.com; martin.petersen@oracle.com; linux-
> scsi@vger.kernel.org; linux-kernel@vger.kernel.org;
> 1395428693sheep@gmail.com; alex000young@gmail.com
> Subject: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in
> qedi_remove due to race condition
> 
> External Email
> 
> ----------------------------------------------------------------------
> Mike Christie <michael.christie@oracle.com> 于2023年3月21日周二 00:11写
> 道:
> >
> > On 3/18/23 3:13 AM, Zheng Wang wrote:
> > > In qedi_probe, it calls __qedi_probe, which bound
> > > &qedi->recovery_work with qedi_recovery_handler and bound
> > > &qedi->board_disable_work with qedi_board_disable_work.
> > >
> > > When it calls qedi_schedule_recovery_handler, it will finally call
> > > schedule_delayed_work to start the work.
> > >
> > > When we call qedi_remove to remove the driver, there may be a
> > > sequence as follows:
> > >
> > > Fix it by finishing the work before cleanup in qedi_remove.
> > >
> > > CPU0                  CPU1
> > >
> > >                      |qedi_recovery_handler
> > > qedi_remove          |
> > >   __qedi_remove      |
> > > iscsi_host_free      |
> > > scsi_host_put        |
> > > //free shost         |
> > >                      |iscsi_host_for_each_session
> > >                      |//use qedi->shost
> > >
> > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
> > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> > > ---
> > >  drivers/scsi/qedi/qedi_main.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/drivers/scsi/qedi/qedi_main.c
> > > b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344
> > > 100644
> > > --- a/drivers/scsi/qedi/qedi_main.c
> > > +++ b/drivers/scsi/qedi/qedi_main.c
> > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev
> *pdev, int mode)
> > >       int rval;
> > >       u16 retry = 10;
> > >
> > > +     /*cancel work*/
> >
> > This comment is not needed. The name of the functions you are calling
> > have "cancel" and "work" in them so we know. If you want to add a
> > comment explain why the cancel calls are needed here.
> >
> 
> Hi,
> 
> Sorry for my late reply and thanks for your advice. Will remove it in the next
> version of patch.
> 
> >
> > > +     cancel_delayed_work_sync(&qedi->recovery_work);
> > > +     cancel_delayed_work_sync(&qedi->board_disable_work);
> >
> >
> > How do you know after you have called cancel_delayed_work_sync that
> > schedule_recovery_handler or schedule_hw_err_handler can't be called?
> > I don't know the qed driver well, but it looks like you could have
> > operations still running, so after you cancel here one of those ops
> > could lead to them scheduling the work again.
> >
> 
> Sorry I didn't know how to make sure there's no more schedule. But I do
> think this is important. Maybe there're someone else who can give us advice.
> 
> Best regards,
> Zheng
> >

Best place to call cancel_delayed_work_sync is after qedi_ops->stop(qedi->cdev) and 
qedi_ops->ll2->stop(qedi->cdev);, after these qed calls firmware will not post any events to qedi driver.

Thanks,
Manish


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition
  2023-03-23 10:17     ` [EXT] " Manish Rangankar
@ 2023-03-24 15:21       ` Zheng Hacker
  0 siblings, 0 replies; 5+ messages in thread
From: Zheng Hacker @ 2023-03-24 15:21 UTC (permalink / raw)
  To: Manish Rangankar
  Cc: Mike Christie, Zheng Wang, Nilesh Javali,
	GR-QLogic-Storage-Upstream, jejb, martin.petersen, linux-scsi,
	linux-kernel, 1395428693sheep, alex000young

Manish Rangankar <mrangankar@marvell.com> 于2023年3月23日周四 18:17写道:
>
>
>
> > -----Original Message-----
> > From: Zheng Hacker <hackerzheng666@gmail.com>
> > Sent: Thursday, March 23, 2023 9:15 AM
> > To: Mike Christie <michael.christie@oracle.com>
> > Cc: Zheng Wang <zyytlz.wz@163.com>; Nilesh Javali <njavali@marvell.com>;
> > Manish Rangankar <mrangankar@marvell.com>; GR-QLogic-Storage-
> > Upstream <GR-QLogic-Storage-Upstream@marvell.com>;
> > jejb@linux.ibm.com; martin.petersen@oracle.com; linux-
> > scsi@vger.kernel.org; linux-kernel@vger.kernel.org;
> > 1395428693sheep@gmail.com; alex000young@gmail.com
> > Subject: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in
> > qedi_remove due to race condition
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > Mike Christie <michael.christie@oracle.com> 于2023年3月21日周二 00:11写
> > 道:
> > >
> > > On 3/18/23 3:13 AM, Zheng Wang wrote:
> > > > In qedi_probe, it calls __qedi_probe, which bound
> > > > &qedi->recovery_work with qedi_recovery_handler and bound
> > > > &qedi->board_disable_work with qedi_board_disable_work.
> > > >
> > > > When it calls qedi_schedule_recovery_handler, it will finally call
> > > > schedule_delayed_work to start the work.
> > > >
> > > > When we call qedi_remove to remove the driver, there may be a
> > > > sequence as follows:
> > > >
> > > > Fix it by finishing the work before cleanup in qedi_remove.
> > > >
> > > > CPU0                  CPU1
> > > >
> > > >                      |qedi_recovery_handler
> > > > qedi_remove          |
> > > >   __qedi_remove      |
> > > > iscsi_host_free      |
> > > > scsi_host_put        |
> > > > //free shost         |
> > > >                      |iscsi_host_for_each_session
> > > >                      |//use qedi->shost
> > > >
> > > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
> > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> > > > ---
> > > >  drivers/scsi/qedi/qedi_main.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/drivers/scsi/qedi/qedi_main.c
> > > > b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344
> > > > 100644
> > > > --- a/drivers/scsi/qedi/qedi_main.c
> > > > +++ b/drivers/scsi/qedi/qedi_main.c
> > > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev
> > *pdev, int mode)
> > > >       int rval;
> > > >       u16 retry = 10;
> > > >
> > > > +     /*cancel work*/
> > >
> > > This comment is not needed. The name of the functions you are calling
> > > have "cancel" and "work" in them so we know. If you want to add a
> > > comment explain why the cancel calls are needed here.
> > >
> >
> > Hi,
> >
> > Sorry for my late reply and thanks for your advice. Will remove it in the next
> > version of patch.
> >
> > >
> > > > +     cancel_delayed_work_sync(&qedi->recovery_work);
> > > > +     cancel_delayed_work_sync(&qedi->board_disable_work);
> > >
> > >
> > > How do you know after you have called cancel_delayed_work_sync that
> > > schedule_recovery_handler or schedule_hw_err_handler can't be called?
> > > I don't know the qed driver well, but it looks like you could have
> > > operations still running, so after you cancel here one of those ops
> > > could lead to them scheduling the work again.
> > >
> >
> > Sorry I didn't know how to make sure there's no more schedule. But I do
> > think this is important. Maybe there're someone else who can give us advice.
> >
> > Best regards,
> > Zheng
> > >
>
> Best place to call cancel_delayed_work_sync is after qedi_ops->stop(qedi->cdev) and
> qedi_ops->ll2->stop(qedi->cdev);, after these qed calls firmware will not post any events to qedi driver.
>

Sorry for my late reply. Will apply that in next version.

Best reagrds,
Zheng

> Thanks,
> Manish
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-03-24 15:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-18  8:13 [PATCH RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition Zheng Wang
2023-03-20 16:11 ` Mike Christie
2023-03-23  3:44   ` Zheng Hacker
2023-03-23 10:17     ` [EXT] " Manish Rangankar
2023-03-24 15:21       ` Zheng Hacker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.