stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug
@ 2020-03-11 10:36 Sreekanth Reddy
  2020-03-11 11:04 ` Amit Shah
  2020-03-14  2:25 ` Elliott, Robert (Servers)
  0 siblings, 2 replies; 10+ messages in thread
From: Sreekanth Reddy @ 2020-03-11 10:36 UTC (permalink / raw)
  To: martin.petersen
  Cc: linux-scsi, sathya.prakash, suganath-prabu.subramani, stable,
	amit, Sreekanth Reddy

Generic protection fault type kernel panic is observed when user
performs soft(ordered) HBA unplug operation while IOs are running
on drives connected to HBA.

When user performs ordered HBA removal operation then kernel calls
PCI device's .remove() call back function where driver is flushing out
all the outstanding SCSI IO commands with DID_NO_CONNECT host byte and
also un-maps sg buffers allocated for these IO commands.
But in the ordered HBA removal case (unlike of real HBA hot unplug)
HBA device is still alive and hence HBA hardware is performing the
DMA operations to those buffers on the system memory which are already
unmapped while flushing out the outstanding SCSI IO commands
and this leads to Kernel panic.

Fix:
Don't flush out the outstanding IOs from .remove() path in case of
ordered HBA removal since HBA will be still alive in this case and
it can complete the outstanding IOs. Flush out the outstanding IOs
only in case physical HBA hot unplug where their won't be any
communication with the HBA.

Cc: stable@vger.kernel.org
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 778d5e6..04a40af 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -9908,8 +9908,8 @@ static void scsih_remove(struct pci_dev *pdev)
 
 	ioc->remove_host = 1;
 
-	mpt3sas_wait_for_commands_to_complete(ioc);
-	_scsih_flush_running_cmds(ioc);
+	if (!pci_device_is_present(pdev))
+		_scsih_flush_running_cmds(ioc);
 
 	_scsih_fw_event_cleanup_queue(ioc);
 
@@ -9992,8 +9992,8 @@ static void scsih_remove(struct pci_dev *pdev)
 
 	ioc->remove_host = 1;
 
-	mpt3sas_wait_for_commands_to_complete(ioc);
-	_scsih_flush_running_cmds(ioc);
+	if (!pci_device_is_present(pdev))
+		_scsih_flush_running_cmds(ioc);
 
 	_scsih_fw_event_cleanup_queue(ioc);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug
  2020-03-11 10:36 [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug Sreekanth Reddy
@ 2020-03-11 11:04 ` Amit Shah
  2020-03-11 11:25   ` Sreekanth Reddy
  2020-03-14  2:25 ` Elliott, Robert (Servers)
  1 sibling, 1 reply; 10+ messages in thread
From: Amit Shah @ 2020-03-11 11:04 UTC (permalink / raw)
  To: Sreekanth Reddy, martin.petersen
  Cc: linux-scsi, sathya.prakash, suganath-prabu.subramani, stable

On Wed, 2020-03-11 at 06:36 -0400, Sreekanth Reddy wrote:
> Generic protection fault type kernel panic is observed when user
> performs soft(ordered) HBA unplug operation while IOs are running
> on drives connected to HBA.
> 
> When user performs ordered HBA removal operation then kernel calls
> PCI device's .remove() call back function where driver is flushing
> out
> all the outstanding SCSI IO commands with DID_NO_CONNECT host byte
> and
> also un-maps sg buffers allocated for these IO commands.
> But in the ordered HBA removal case (unlike of real HBA hot unplug)
> HBA device is still alive and hence HBA hardware is performing the
> DMA operations to those buffers on the system memory which are
> already
> unmapped while flushing out the outstanding SCSI IO commands
> and this leads to Kernel panic.
> 
> Fix:
> Don't flush out the outstanding IOs from .remove() path in case of
> ordered HBA removal since HBA will be still alive in this case and
> it can complete the outstanding IOs. Flush out the outstanding IOs
> only in case physical HBA hot unplug where their won't be any
> communication with the HBA.

Can you please point to the commit that introduces the bug?

> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
> ---
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 778d5e6..04a40af 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -9908,8 +9908,8 @@ static void scsih_remove(struct pci_dev *pdev)
>  
>  	ioc->remove_host = 1;
>  
> -	mpt3sas_wait_for_commands_to_complete(ioc);
> -	_scsih_flush_running_cmds(ioc);
> +	if (!pci_device_is_present(pdev))
> +		_scsih_flush_running_cmds(ioc);
>  
>  	_scsih_fw_event_cleanup_queue(ioc);
>  
> @@ -9992,8 +9992,8 @@ static void scsih_remove(struct pci_dev *pdev)

Just a note: this function is scsih_shutdown().  Doesn't block
application of the patch, though.  Just wondering how the patch was
created.

>  
>  	ioc->remove_host = 1;
>  
> -	mpt3sas_wait_for_commands_to_complete(ioc);
> -	_scsih_flush_running_cmds(ioc);
> +	if (!pci_device_is_present(pdev))
> +		_scsih_flush_running_cmds(ioc);
>  
>  	_scsih_fw_event_cleanup_queue(ioc);
>  


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug
  2020-03-11 11:04 ` Amit Shah
@ 2020-03-11 11:25   ` Sreekanth Reddy
  2020-03-11 11:49     ` Sreekanth Reddy
  0 siblings, 1 reply; 10+ messages in thread
From: Sreekanth Reddy @ 2020-03-11 11:25 UTC (permalink / raw)
  To: Amit Shah
  Cc: Martin K. Petersen, linux-scsi, Sathya Prakash,
	Suganath Prabu Subramani, stable

On Wed, Mar 11, 2020 at 4:35 PM Amit Shah <amit@kernel.org> wrote:
>
> On Wed, 2020-03-11 at 06:36 -0400, Sreekanth Reddy wrote:
> > Generic protection fault type kernel panic is observed when user
> > performs soft(ordered) HBA unplug operation while IOs are running
> > on drives connected to HBA.
> >
> > When user performs ordered HBA removal operation then kernel calls
> > PCI device's .remove() call back function where driver is flushing
> > out
> > all the outstanding SCSI IO commands with DID_NO_CONNECT host byte
> > and
> > also un-maps sg buffers allocated for these IO commands.
> > But in the ordered HBA removal case (unlike of real HBA hot unplug)
> > HBA device is still alive and hence HBA hardware is performing the
> > DMA operations to those buffers on the system memory which are
> > already
> > unmapped while flushing out the outstanding SCSI IO commands
> > and this leads to Kernel panic.
> >
> > Fix:
> > Don't flush out the outstanding IOs from .remove() path in case of
> > ordered HBA removal since HBA will be still alive in this case and
> > it can complete the outstanding IOs. Flush out the outstanding IOs
> > only in case physical HBA hot unplug where their won't be any
> > communication with the HBA.
>
> Can you please point to the commit that introduces the bug?

Sure I will add the commit ID which introduced this bug in the next patch.

>
> >
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
> > ---
> >  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > index 778d5e6..04a40af 100644
> > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > @@ -9908,8 +9908,8 @@ static void scsih_remove(struct pci_dev *pdev)
> >
> >       ioc->remove_host = 1;
> >
> > -     mpt3sas_wait_for_commands_to_complete(ioc);
> > -     _scsih_flush_running_cmds(ioc);
> > +     if (!pci_device_is_present(pdev))
> > +             _scsih_flush_running_cmds(ioc);
> >
> >       _scsih_fw_event_cleanup_queue(ioc);
> >
> > @@ -9992,8 +9992,8 @@ static void scsih_remove(struct pci_dev *pdev)
>
> Just a note: this function is scsih_shutdown().  Doesn't block
> application of the patch, though.  Just wondering how the patch was
> created.

Sorry I didn't get you. Can you please elaborate your query?

>
> >
> >       ioc->remove_host = 1;
> >
> > -     mpt3sas_wait_for_commands_to_complete(ioc);
> > -     _scsih_flush_running_cmds(ioc);
> > +     if (!pci_device_is_present(pdev))
> > +             _scsih_flush_running_cmds(ioc);
> >
> >       _scsih_fw_event_cleanup_queue(ioc);
> >
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug
  2020-03-11 11:25   ` Sreekanth Reddy
@ 2020-03-11 11:49     ` Sreekanth Reddy
  2020-03-11 14:48       ` Amit Shah
  2020-03-14  2:25       ` Elliott, Robert (Servers)
  0 siblings, 2 replies; 10+ messages in thread
From: Sreekanth Reddy @ 2020-03-11 11:49 UTC (permalink / raw)
  To: Amit Shah
  Cc: Martin K. Petersen, linux-scsi, Sathya Prakash,
	Suganath Prabu Subramani, stable

On Wed, Mar 11, 2020 at 4:55 PM Sreekanth Reddy
<sreekanth.reddy@broadcom.com> wrote:
>
> On Wed, Mar 11, 2020 at 4:35 PM Amit Shah <amit@kernel.org> wrote:
> >
> > On Wed, 2020-03-11 at 06:36 -0400, Sreekanth Reddy wrote:
> > > Generic protection fault type kernel panic is observed when user
> > > performs soft(ordered) HBA unplug operation while IOs are running
> > > on drives connected to HBA.
> > >
> > > When user performs ordered HBA removal operation then kernel calls
> > > PCI device's .remove() call back function where driver is flushing
> > > out
> > > all the outstanding SCSI IO commands with DID_NO_CONNECT host byte
> > > and
> > > also un-maps sg buffers allocated for these IO commands.
> > > But in the ordered HBA removal case (unlike of real HBA hot unplug)
> > > HBA device is still alive and hence HBA hardware is performing the
> > > DMA operations to those buffers on the system memory which are
> > > already
> > > unmapped while flushing out the outstanding SCSI IO commands
> > > and this leads to Kernel panic.
> > >
> > > Fix:
> > > Don't flush out the outstanding IOs from .remove() path in case of
> > > ordered HBA removal since HBA will be still alive in this case and
> > > it can complete the outstanding IOs. Flush out the outstanding IOs
> > > only in case physical HBA hot unplug where their won't be any
> > > communication with the HBA.
> >
> > Can you please point to the commit that introduces the bug?
>
> Sure I will add the commit ID which introduced this bug in the next patch.
>
> >
> > >
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
> > > ---
> > >  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > index 778d5e6..04a40af 100644
> > > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > @@ -9908,8 +9908,8 @@ static void scsih_remove(struct pci_dev *pdev)
> > >
> > >       ioc->remove_host = 1;
> > >
> > > -     mpt3sas_wait_for_commands_to_complete(ioc);
> > > -     _scsih_flush_running_cmds(ioc);
> > > +     if (!pci_device_is_present(pdev))
> > > +             _scsih_flush_running_cmds(ioc);
> > >
> > >       _scsih_fw_event_cleanup_queue(ioc);
> > >
> > > @@ -9992,8 +9992,8 @@ static void scsih_remove(struct pci_dev *pdev)
> >
> > Just a note: this function is scsih_shutdown().  Doesn't block
> > application of the patch, though.  Just wondering how the patch was
> > created.

I got your query now,  yes this hunk change is in scsih_shutdown()
function. I am not sure why scsih_remove name is getting displayed
here in this hunk. I have used 'git format-patch' to generate the
patch.

>
> Sorry I didn't get you. Can you please elaborate your query?
>
> >
> > >
> > >       ioc->remove_host = 1;
> > >
> > > -     mpt3sas_wait_for_commands_to_complete(ioc);
> > > -     _scsih_flush_running_cmds(ioc);
> > > +     if (!pci_device_is_present(pdev))
> > > +             _scsih_flush_running_cmds(ioc);
> > >
> > >       _scsih_fw_event_cleanup_queue(ioc);
> > >
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug
  2020-03-11 11:49     ` Sreekanth Reddy
@ 2020-03-11 14:48       ` Amit Shah
  2020-03-14  2:25       ` Elliott, Robert (Servers)
  1 sibling, 0 replies; 10+ messages in thread
From: Amit Shah @ 2020-03-11 14:48 UTC (permalink / raw)
  To: Sreekanth Reddy
  Cc: Martin K. Petersen, linux-scsi, Sathya Prakash,
	Suganath Prabu Subramani, stable

On Wed, 2020-03-11 at 17:19 +0530, Sreekanth Reddy wrote:
> On Wed, Mar 11, 2020 at 4:55 PM Sreekanth Reddy
> <sreekanth.reddy@broadcom.com> wrote:
> > 
> > On Wed, Mar 11, 2020 at 4:35 PM Amit Shah <amit@kernel.org> wrote:
> > > 
> > > On Wed, 2020-03-11 at 06:36 -0400, Sreekanth Reddy wrote:
> > > > Generic protection fault type kernel panic is observed when
> > > > user
> > > > performs soft(ordered) HBA unplug operation while IOs are
> > > > running
> > > > on drives connected to HBA.
> > > > 
> > > > When user performs ordered HBA removal operation then kernel
> > > > calls
> > > > PCI device's .remove() call back function where driver is
> > > > flushing
> > > > out
> > > > all the outstanding SCSI IO commands with DID_NO_CONNECT host
> > > > byte
> > > > and
> > > > also un-maps sg buffers allocated for these IO commands.
> > > > But in the ordered HBA removal case (unlike of real HBA hot
> > > > unplug)
> > > > HBA device is still alive and hence HBA hardware is performing
> > > > the
> > > > DMA operations to those buffers on the system memory which are
> > > > already
> > > > unmapped while flushing out the outstanding SCSI IO commands
> > > > and this leads to Kernel panic.
> > > > 
> > > > Fix:
> > > > Don't flush out the outstanding IOs from .remove() path in case
> > > > of
> > > > ordered HBA removal since HBA will be still alive in this case
> > > > and
> > > > it can complete the outstanding IOs. Flush out the outstanding
> > > > IOs
> > > > only in case physical HBA hot unplug where their won't be any
> > > > communication with the HBA.
> > > 
> > > Can you please point to the commit that introduces the bug?
> > 
> > Sure I will add the commit ID which introduced this bug in the next
> > patch.

Thanks.

> > 
> > > 
> > > > 
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
> > > > ---
> > > >  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 8 ++++----
> > > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > > b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > > index 778d5e6..04a40af 100644
> > > > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > > @@ -9908,8 +9908,8 @@ static void scsih_remove(struct pci_dev
> > > > *pdev)
> > > > 
> > > >       ioc->remove_host = 1;
> > > > 
> > > > -     mpt3sas_wait_for_commands_to_complete(ioc);
> > > > -     _scsih_flush_running_cmds(ioc);
> > > > +     if (!pci_device_is_present(pdev))
> > > > +             _scsih_flush_running_cmds(ioc);
> > > > 
> > > >       _scsih_fw_event_cleanup_queue(ioc);
> > > > 
> > > > @@ -9992,8 +9992,8 @@ static void scsih_remove(struct pci_dev
> > > > *pdev)
> > > 
> > > Just a note: this function is scsih_shutdown().  Doesn't block
> > > application of the patch, though.  Just wondering how the patch
> > > was
> > > created.
> 
> I got your query now,  yes this hunk change is in scsih_shutdown()
> function. I am not sure why scsih_remove name is getting displayed
> here in this hunk. I have used 'git format-patch' to generate the
> patch.

Thanks.  Does the commit description need an update as well?  It only
talks about remove callback.

> 
> > 
> > Sorry I didn't get you. Can you please elaborate your query?
> > 
> > > 
> > > > 
> > > >       ioc->remove_host = 1;
> > > > 
> > > > -     mpt3sas_wait_for_commands_to_complete(ioc);
> > > > -     _scsih_flush_running_cmds(ioc);
> > > > +     if (!pci_device_is_present(pdev))
> > > > +             _scsih_flush_running_cmds(ioc);
> > > > 
> > > >       _scsih_fw_event_cleanup_queue(ioc);
> > > > 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug
  2020-03-11 11:49     ` Sreekanth Reddy
  2020-03-11 14:48       ` Amit Shah
@ 2020-03-14  2:25       ` Elliott, Robert (Servers)
  1 sibling, 0 replies; 10+ messages in thread
From: Elliott, Robert (Servers) @ 2020-03-14  2:25 UTC (permalink / raw)
  To: Sreekanth Reddy, Amit Shah
  Cc: Martin K. Petersen, linux-scsi, Sathya Prakash,
	Suganath Prabu Subramani, stable



> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org <linux-scsi-
> owner@vger.kernel.org> On Behalf Of Sreekanth Reddy
> Sent: Wednesday, March 11, 2020 6:50 AM
> To: Amit Shah <amit@kernel.org>
> Subject: Re: [PATCH] mpt3sas: Fix kernel panic observed on soft HBA
> unplug
> 
> On Wed, Mar 11, 2020 at 4:55 PM Sreekanth Reddy
> <sreekanth.reddy@broadcom.com> wrote:
> >
> > On Wed, Mar 11, 2020 at 4:35 PM Amit Shah <amit@kernel.org> wrote:
...
> > > > @@ -9992,8 +9992,8 @@ static void scsih_remove(struct pci_dev
> *pdev)
> > >
> > > Just a note: this function is scsih_shutdown().  Doesn't block
> > > application of the patch, though.  Just wondering how the patch was
> > > created.
> 
> I got your query now,  yes this hunk change is in scsih_shutdown()
> function. I am not sure why scsih_remove name is getting displayed
> here in this hunk. I have used 'git format-patch' to generate the
> patch.
> 

The scsih_shutdown() function definition is spread over three lines
(although it could easily fit on one line), while scsih_remove() was
the last function whose definition was on one line. git is apparently
not recognizing scsi_shutdown() as a function definition.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug
  2020-03-11 10:36 [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug Sreekanth Reddy
  2020-03-11 11:04 ` Amit Shah
@ 2020-03-14  2:25 ` Elliott, Robert (Servers)
  2020-03-16  6:15   ` Sreekanth Reddy
  1 sibling, 1 reply; 10+ messages in thread
From: Elliott, Robert (Servers) @ 2020-03-14  2:25 UTC (permalink / raw)
  To: Sreekanth Reddy, martin.petersen
  Cc: linux-scsi, sathya.prakash, suganath-prabu.subramani, stable, amit



> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org <linux-scsi-
> owner@vger.kernel.org> On Behalf Of Sreekanth Reddy
> Sent: Wednesday, March 11, 2020 5:37 AM
> To: martin.petersen@oracle.com
> Cc: linux-scsi@vger.kernel.org; sathya.prakash@broadcom.com; suganath-
> prabu.subramani@broadcom.com; stable@vger.kernel.org; amit@kernel.org;
> Sreekanth Reddy <sreekanth.reddy@broadcom.com>
> Subject: [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug
> 
> Generic protection fault type kernel panic is observed when user
> performs soft(ordered) HBA unplug operation while IOs are running
> on drives connected to HBA.
> 
> When user performs ordered HBA removal operation then kernel calls
> PCI device's .remove() call back function where driver is flushing out
> all the outstanding SCSI IO commands with DID_NO_CONNECT host byte and
> also un-maps sg buffers allocated for these IO commands.
> But in the ordered HBA removal case (unlike of real HBA hot unplug)
> HBA device is still alive and hence HBA hardware is performing the
> DMA operations to those buffers on the system memory which are already
> unmapped while flushing out the outstanding SCSI IO commands
> and this leads to Kernel panic.
> 
> Fix:
> Don't flush out the outstanding IOs from .remove() path in case of
> ordered HBA removal since HBA will be still alive in this case and
> it can complete the outstanding IOs. Flush out the outstanding IOs
> only in case physical HBA hot unplug where their won't be any
> communication with the HBA.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
> ---
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 778d5e6..04a40af 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -9908,8 +9908,8 @@ static void scsih_remove(struct pci_dev *pdev)
> 
>  	ioc->remove_host = 1;
> 
> -	mpt3sas_wait_for_commands_to_complete(ioc);

Immediately removing the driver with IOs pending seems dangerous. 

That function includes a timeout to avoid hanging forever, which
is reasonable (avoid hanging during system shutdown). Perhaps the
kernel panic was happening because that function timed out? 

Reporting a warning or error and doing special handling might be
appropriate if that occurs. That should be rare, though; the normal
case should be to cleanly finish any outstanding commands.

> -	_scsih_flush_running_cmds(ioc);
> +	if (!pci_device_is_present(pdev))
> +		_scsih_flush_running_cmds(ioc);

If that branch is not taken, then it proceeds to remove the driver
with IOs pending. That'll wipe out all sorts of ioc structures
and things like interrupt handler code, leaving memory mapped forever
(no code left to call scsi_dma_unmap). That might be better than
a kernel panic, but still not good.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug
  2020-03-14  2:25 ` Elliott, Robert (Servers)
@ 2020-03-16  6:15   ` Sreekanth Reddy
  2020-03-27  1:50     ` Martin K. Petersen
  0 siblings, 1 reply; 10+ messages in thread
From: Sreekanth Reddy @ 2020-03-16  6:15 UTC (permalink / raw)
  To: Elliott, Robert (Servers)
  Cc: martin.petersen, linux-scsi, sathya.prakash,
	suganath-prabu.subramani, stable, amit

On Sat, Mar 14, 2020 at 7:56 AM Elliott, Robert (Servers)
<elliott@hpe.com> wrote:
>
>
>
> > -----Original Message-----
> > From: linux-scsi-owner@vger.kernel.org <linux-scsi-
> > owner@vger.kernel.org> On Behalf Of Sreekanth Reddy
> > Sent: Wednesday, March 11, 2020 5:37 AM
> > To: martin.petersen@oracle.com
> > Cc: linux-scsi@vger.kernel.org; sathya.prakash@broadcom.com; suganath-
> > prabu.subramani@broadcom.com; stable@vger.kernel.org; amit@kernel.org;
> > Sreekanth Reddy <sreekanth.reddy@broadcom.com>
> > Subject: [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug
> >
> > Generic protection fault type kernel panic is observed when user
> > performs soft(ordered) HBA unplug operation while IOs are running
> > on drives connected to HBA.
> >
> > When user performs ordered HBA removal operation then kernel calls
> > PCI device's .remove() call back function where driver is flushing out
> > all the outstanding SCSI IO commands with DID_NO_CONNECT host byte and
> > also un-maps sg buffers allocated for these IO commands.
> > But in the ordered HBA removal case (unlike of real HBA hot unplug)
> > HBA device is still alive and hence HBA hardware is performing the
> > DMA operations to those buffers on the system memory which are already
> > unmapped while flushing out the outstanding SCSI IO commands
> > and this leads to Kernel panic.
> >
> > Fix:
> > Don't flush out the outstanding IOs from .remove() path in case of
> > ordered HBA removal since HBA will be still alive in this case and
> > it can complete the outstanding IOs. Flush out the outstanding IOs
> > only in case physical HBA hot unplug where their won't be any
> > communication with the HBA.
> >
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
> > ---
> >  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > index 778d5e6..04a40af 100644
> > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > @@ -9908,8 +9908,8 @@ static void scsih_remove(struct pci_dev *pdev)
> >
> >       ioc->remove_host = 1;
> >
> > -     mpt3sas_wait_for_commands_to_complete(ioc);
>
> Immediately removing the driver with IOs pending seems dangerous.
>
> That function includes a timeout to avoid hanging forever, which
> is reasonable (avoid hanging during system shutdown). Perhaps the
> kernel panic was happening because that function timed out?
>
> Reporting a warning or error and doing special handling might be
> appropriate if that occurs. That should be rare, though; the normal
> case should be to cleanly finish any outstanding commands.
>
> > -     _scsih_flush_running_cmds(ioc);
> > +     if (!pci_device_is_present(pdev))
> > +             _scsih_flush_running_cmds(ioc);
>
> If that branch is not taken, then it proceeds to remove the driver
> with IOs pending. That'll wipe out all sorts of ioc structures
> and things like interrupt handler code, leaving memory mapped forever
> (no code left to call scsi_dma_unmap). That might be better than
> a kernel panic, but still not good.

In the unload path driver call sas_remove_host() API before releasing
the resources. This sas_remove_host() API waits for all the
outstanding IOs to be completed. So here, indirectly driver is waiting
for the outstanding IOs to be processed before releasing the HBA
resources.  So only in the cases where HBA is inaccessible (e.g. HBA
unplug case), driver is flushing out the outstanding commands to avoid
SCSI error handling over head and can quilkey complete the driver
unload operation.

>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug
  2020-03-16  6:15   ` Sreekanth Reddy
@ 2020-03-27  1:50     ` Martin K. Petersen
  2020-03-27 10:35       ` Sreekanth Reddy
  0 siblings, 1 reply; 10+ messages in thread
From: Martin K. Petersen @ 2020-03-27  1:50 UTC (permalink / raw)
  To: Sreekanth Reddy
  Cc: Elliott, Robert (Servers),
	martin.petersen, linux-scsi, sathya.prakash,
	suganath-prabu.subramani, stable, amit


Sreekanth,

> In the unload path driver call sas_remove_host() API before releasing
> the resources. This sas_remove_host() API waits for all the
> outstanding IOs to be completed. So here, indirectly driver is waiting
> for the outstanding IOs to be processed before releasing the HBA
> resources.  So only in the cases where HBA is inaccessible (e.g. HBA
> unplug case), driver is flushing out the outstanding commands to avoid
> SCSI error handling over head and can quilkey complete the driver
> unload operation.

None of this is clear from the commit description. Please resubmit patch
with a new description clarifying why and when it is safe to drop
outstanding commands.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug
  2020-03-27  1:50     ` Martin K. Petersen
@ 2020-03-27 10:35       ` Sreekanth Reddy
  0 siblings, 0 replies; 10+ messages in thread
From: Sreekanth Reddy @ 2020-03-27 10:35 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Elliott, Robert (Servers),
	linux-scsi, sathya.prakash, suganath-prabu.subramani, stable,
	amit

On Fri, Mar 27, 2020 at 7:20 AM Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>
>
> Sreekanth,
>
> > In the unload path driver call sas_remove_host() API before releasing
> > the resources. This sas_remove_host() API waits for all the
> > outstanding IOs to be completed. So here, indirectly driver is waiting
> > for the outstanding IOs to be processed before releasing the HBA
> > resources.  So only in the cases where HBA is inaccessible (e.g. HBA
> > unplug case), driver is flushing out the outstanding commands to avoid
> > SCSI error handling over head and can quilkey complete the driver
> > unload operation.
>
> None of this is clear from the commit description. Please resubmit patch
> with a new description clarifying why and when it is safe to drop
> outstanding commands.

Martin,

Posted the patch with updated description.
https://patchwork.kernel.org/patch/11462107/

Thanks & Regards,
Sreekanth

>
> --
> Martin K. Petersen      Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-03-27 10:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-11 10:36 [PATCH] mpt3sas: Fix kernel panic observed on soft HBA unplug Sreekanth Reddy
2020-03-11 11:04 ` Amit Shah
2020-03-11 11:25   ` Sreekanth Reddy
2020-03-11 11:49     ` Sreekanth Reddy
2020-03-11 14:48       ` Amit Shah
2020-03-14  2:25       ` Elliott, Robert (Servers)
2020-03-14  2:25 ` Elliott, Robert (Servers)
2020-03-16  6:15   ` Sreekanth Reddy
2020-03-27  1:50     ` Martin K. Petersen
2020-03-27 10:35       ` Sreekanth Reddy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).