linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects
@ 2016-05-13 20:28 Calvin Owens
  2016-05-13 21:17 ` Elliott, Robert (Persistent Memory)
  0 siblings, 1 reply; 8+ messages in thread
From: Calvin Owens @ 2016-05-13 20:28 UTC (permalink / raw)
  To: Sathya Prakash, Chaitra P B, Suganath Prabu Subramani,
	James E.J. Bottomley, Martin K. Petersen
  Cc: MPT-FusionLinux.pdl, linux-scsi, linux-kernel, kernel-team, calvinowens

On the hardware I'm testing on, simply removing the mpt3sas module
triggers a litany of WARNs culminating in an OOPS:

  ------------[ cut here ]------------
  WARNING: CPU: 5 PID: 13348 at lib/kobject.c:244 kobject_add_internal+0x359/0x8a0
  kobject_add_internal failed for ArrayDevice09 (error: -2 parent: 6:0:15:0)
  CPU: 5 PID: 13348 Comm: rmmod Not tainted 4.6.0-rc2-mpt3sas-debug-00001-g7e7e6f4 #2
  Hardware name: Wiwynn   HoneyBadger/PantherPlus, BIOS HBP6.6 11/20/2015
   ffffffff82b88ec0 ffff8806ce76f820 ffffffff81deff03 ffff8806ce76f898
   0000000000000000 ffff8806ce76f868 ffffffff811191e2 ffff880749819af8
   00000000000000f4 ffffed00d9cedf0f ffff880749819b08 ffff88074c9b8168
  Call Trace:
   [<ffffffff81deff03>] dump_stack+0x67/0x94
   [<ffffffff811191e2>] __warn+0x172/0x1b0
   [<ffffffff811192b7>] warn_slowpath_fmt+0x97/0xb0
   [<ffffffff81df72d9>] kobject_add_internal+0x359/0x8a0
   [<ffffffff81df792e>] kobject_add+0x10e/0x1c0
   [<ffffffff820b4cda>] device_add+0x30a/0x1490
   [<ffffffffa036a222>] enclosure_remove_device+0x172/0x1cc [enclosure]
   [<ffffffffa0390254>] ses_intf_remove+0x1c4/0x270 [ses]
   [<ffffffff820b1b3b>] device_del+0x2ab/0x680
   [<ffffffff820b1f22>] device_unregister+0x12/0x30
   [<ffffffff821264c5>] __scsi_remove_device+0x1d5/0x250
   [<ffffffff821225ec>] scsi_forget_host+0x12c/0x1e0
   [<ffffffff820ff4dc>] scsi_remove_host+0x10c/0x300
   [<ffffffffa0066c31>] scsih_remove+0x321/0x680 [mpt3sas]
   [<ffffffff81ebe5e0>] pci_device_remove+0x70/0x110
   [<ffffffff820bc7a0>] __device_release_driver+0x160/0x3a0
   [<ffffffff820bdd33>] driver_detach+0x183/0x200
   [<ffffffff820bbbbf>] bus_remove_driver+0xdf/0x200
   [<ffffffff820be847>] driver_unregister+0x67/0xa0
   [<ffffffff81ebc50e>] pci_unregister_driver+0x1e/0xe0
   [<ffffffffa0093e0a>] _mpt3sas_exit+0x23/0x219 [mpt3sas]
   [<ffffffff81291e7e>] SyS_delete_module+0x2ee/0x390
   [<ffffffff82924aa5>] entry_SYSCALL_64_fastpath+0x18/0xa8
  ---[ end trace fe163024b624f4af ]---

  general protection fault: 0000 [#1] SMP KASAN
  CPU: 6 PID: 17388 Comm: rmmod Tainted: G        W       4.6.0-rc2-00001-g7e7e6f4 #1
  Hardware name: Wiwynn   HoneyBadger/PantherPlus, BIOS HBP6.6 11/20/2015
  task: ffff880753731740 ti: ffff8806896f0000 task.ti: ffff8806896f0000
  RIP: 0010:[<ffffffff8290972f>]  [<ffffffff8290972f>] klist_put+0x1f/0x160
  RSP: 0018:ffff8806896f7a10  EFLAGS: 00010202
  RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff880753731f48
  RDX: 000000000000000b RSI: 0000000000000001 RDI: 0000000000000058
  RBP: ffff8806896f7a30 R08: 0000000000000006 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff880752bec000
  R13: 0000000000000001 R14: dffffc0000000000 R15: ffff880752bec2b0
  FS:  00007feef8ea6700(0000) GS:ffff88075ef80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fe45044d020 CR3: 00000006f7092000 CR4: 00000000001006e0
  Stack:
   0000000000000000 ffff880752bec000 0000000000000000 dffffc0000000000
   ffff8806896f7a40 ffffffff8290987e ffff8806896f7b00 ffffffff820aa3bd
   ffff8806896f7aa0 1ffff100d12def4f ffff880752bec390 ffffffff829183ca
  Call Trace:
   [<ffffffff8290987e>] klist_del+0xe/0x10
   [<ffffffff820aa3bd>] device_del+0x12d/0x680
   [<ffffffff820aa922>] device_unregister+0x12/0x30
   [<ffffffffa0368bb0>] enclosure_unregister+0xe0/0x170 [enclosure]
   [<ffffffffa0390220>] ses_intf_remove+0x190/0x270 [ses]
   [<ffffffff820aa53b>] device_del+0x2ab/0x680
   [<ffffffff820aa922>] device_unregister+0x12/0x30
   [<ffffffff8211ed65>] __scsi_remove_device+0x1d5/0x250
   [<ffffffff8211ae8c>] scsi_forget_host+0x12c/0x1e0
   [<ffffffff820f7d6c>] scsi_remove_host+0x10c/0x300
   [<ffffffffa00668f1>] scsih_remove+0x321/0x680 [mpt3sas]
   [<ffffffff81eb7170>] pci_device_remove+0x70/0x110
   [<ffffffff820b51a0>] __device_release_driver+0x160/0x3a0
   [<ffffffff820b6733>] driver_detach+0x183/0x200
   [<ffffffff820b45bf>] bus_remove_driver+0xdf/0x200
   [<ffffffff820b7247>] driver_unregister+0x67/0xa0
   [<ffffffff81eb509e>] pci_unregister_driver+0x1e/0xe0
   [<ffffffffa009359a>] _mpt3sas_exit+0x23/0xa89 [mpt3sas]
   [<ffffffff81291e7e>] SyS_delete_module+0x2ee/0x390
   [<ffffffff8291d1e5>] entry_SYSCALL_64_fastpath+0x18/0xa8

The issue is that enclosure_remove_device() expects to be able to re-add
the device that was previously enclosured: so with SES running, the order
we unwind things matters in a way it otherwise wouldn't.

Since mpt3sas deletes the SAS objects before the SCSI hosts, enclosure
ends up trying to re-parent the SCSI device from the enclosure to the SAS
PHY which has already been deleted. This obviously ends in sadness.

The fix appears to be simple: just call scsi_remove_host() before we call
sas_port_delete() and/or sas_remove_host().

Signed-off-by: Calvin Owens <calvinowens@fb.com>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index e0e4920..4aa128a 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -8149,6 +8149,8 @@ void scsih_remove(struct pci_dev *pdev)
 		_scsih_raid_device_remove(ioc, raid_device);
 	}
 
+	scsi_remove_host(shost);
+
 	/* free ports attached to the sas_host */
 	list_for_each_entry_safe(mpt3sas_port, next_port,
 	   &ioc->sas_hba.sas_port_list, port_list) {
@@ -8172,7 +8174,6 @@ void scsih_remove(struct pci_dev *pdev)
 	}
 
 	sas_remove_host(shost);
-	scsi_remove_host(shost);
 	mpt3sas_base_detach(ioc);
 	spin_lock(&gioc_lock);
 	list_del(&ioc->list);
-- 
2.8.0.rc2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects
  2016-05-13 20:28 [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects Calvin Owens
@ 2016-05-13 21:17 ` Elliott, Robert (Persistent Memory)
  2016-05-16 20:25   ` Calvin Owens
  0 siblings, 1 reply; 8+ messages in thread
From: Elliott, Robert (Persistent Memory) @ 2016-05-13 21:17 UTC (permalink / raw)
  To: Calvin Owens, Sathya Prakash, Chaitra P B,
	Suganath Prabu Subramani, James E.J. Bottomley,
	Martin K. Petersen
  Cc: MPT-FusionLinux.pdl, linux-scsi, linux-kernel, kernel-team



> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Calvin Owens
> Sent: Friday, May 13, 2016 3:28 PM
...
> Subject: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY
> objects
> 
...
> The issue is that enclosure_remove_device() expects to be able to re-add
> the device that was previously enclosured: so with SES running, the order
> we unwind things matters in a way it otherwise wouldn't.
> 
> Since mpt3sas deletes the SAS objects before the SCSI hosts, enclosure
> ends up trying to re-parent the SCSI device from the enclosure to the SAS
> PHY which has already been deleted. This obviously ends in sadness.
> 
> The fix appears to be simple: just call scsi_remove_host() before we call
> sas_port_delete() and/or sas_remove_host().
> 
...
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -8149,6 +8149,8 @@ void scsih_remove(struct pci_dev *pdev)
>  		_scsih_raid_device_remove(ioc, raid_device);
>  	}
> 
> +	scsi_remove_host(shost);
> +
>  	/* free ports attached to the sas_host */
>  	list_for_each_entry_safe(mpt3sas_port, next_port,
>  	   &ioc->sas_hba.sas_port_list, port_list) {
> @@ -8172,7 +8174,6 @@ void scsih_remove(struct pci_dev *pdev)
>  	}
> 
>  	sas_remove_host(shost);
> -	scsi_remove_host(shost);
>  	mpt3sas_base_detach(ioc);
>  	spin_lock(&gioc_lock);
>  	list_del(&ioc->list);

That change matches the pattern of all other drivers calling
sas_remove_host, except for one:
	drivers/message/fusion/mptsas.c

That consensus means this comment in drivers/scsi/scsi_transport_sas.c
is wrong:

/**
 * sas_remove_host  -  tear down a Scsi_Host's SAS data structures
 * @shost:      Scsi Host that is torn down
 *
 * Removes all SAS PHYs and remote PHYs for a given Scsi_Host.
 * Must be called just before scsi_remove_host for SAS HBAs.
 */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects
  2016-05-13 21:17 ` Elliott, Robert (Persistent Memory)
@ 2016-05-16 20:25   ` Calvin Owens
  2016-05-16 21:51     ` Sathya Prakash Veerichetty
  0 siblings, 1 reply; 8+ messages in thread
From: Calvin Owens @ 2016-05-16 20:25 UTC (permalink / raw)
  To: Elliott, Robert (Persistent Memory)
  Cc: Sathya Prakash, Chaitra P B, Suganath Prabu Subramani,
	James E.J. Bottomley, Martin K. Petersen, MPT-FusionLinux.pdl,
	linux-scsi, linux-kernel, kernel-team, calvinowens

On Friday 05/13 at 21:17 +0000, Elliott, Robert (Persistent Memory) wrote:
> 
> 
> > -----Original Message-----
> > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> > owner@vger.kernel.org] On Behalf Of Calvin Owens
> > Sent: Friday, May 13, 2016 3:28 PM
> ...
> > Subject: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY
> > objects
> > 
> ...
> > The issue is that enclosure_remove_device() expects to be able to re-add
> > the device that was previously enclosured: so with SES running, the order
> > we unwind things matters in a way it otherwise wouldn't.
> > 
> > Since mpt3sas deletes the SAS objects before the SCSI hosts, enclosure
> > ends up trying to re-parent the SCSI device from the enclosure to the SAS
> > PHY which has already been deleted. This obviously ends in sadness.
> > 
> > The fix appears to be simple: just call scsi_remove_host() before we call
> > sas_port_delete() and/or sas_remove_host().
> > 
> ...
> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > @@ -8149,6 +8149,8 @@ void scsih_remove(struct pci_dev *pdev)
> >  		_scsih_raid_device_remove(ioc, raid_device);
> >  	}
> > 
> > +	scsi_remove_host(shost);
> > +
> >  	/* free ports attached to the sas_host */
> >  	list_for_each_entry_safe(mpt3sas_port, next_port,
> >  	   &ioc->sas_hba.sas_port_list, port_list) {
> > @@ -8172,7 +8174,6 @@ void scsih_remove(struct pci_dev *pdev)
> >  	}
> > 
> >  	sas_remove_host(shost);
> > -	scsi_remove_host(shost);
> >  	mpt3sas_base_detach(ioc);
> >  	spin_lock(&gioc_lock);
> >  	list_del(&ioc->list);
> 
> That change matches the pattern of all other drivers calling
> sas_remove_host, except for one:
> 	drivers/message/fusion/mptsas.c
> 
> That consensus means this comment in drivers/scsi/scsi_transport_sas.c
> is wrong:
> 
> /**
>  * sas_remove_host  -  tear down a Scsi_Host's SAS data structures
>  * @shost:      Scsi Host that is torn down
>  *
>  * Removes all SAS PHYs and remote PHYs for a given Scsi_Host.
>  * Must be called just before scsi_remove_host for SAS HBAs.
>  */

Yes, exactly: that comment appears to be backwards, and as you say most
drivers appear to do the opposite (I looked at HPSA at least, which calls
sas_port_delete() before scsi_remove_host()).

I suppose I should send a patch to fix the comment as well? It'd be nice
to get some testing to be certain I'm not breaking somebody else's setup
by fixing mine though...

Thanks,
Calvin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects
  2016-05-16 20:25   ` Calvin Owens
@ 2016-05-16 21:51     ` Sathya Prakash Veerichetty
  2016-05-17  3:13       ` Calvin Owens
  0 siblings, 1 reply; 8+ messages in thread
From: Sathya Prakash Veerichetty @ 2016-05-16 21:51 UTC (permalink / raw)
  To: Calvin Owens, Elliott, Robert (Persistent Memory)
  Cc: Chaitra Basappa, Suganath Prabu Subramani, James E.J. Bottomley,
	Martin K. Petersen, PDL-MPT-FUSIONLINUX, linux-scsi,
	linux-kernel, kernel-team

Our understanding is the relationship between the SCSI host and SAS end
devices is a parent-child and before ripping of SCSI host we need to rip of
all the children. Why the enclosure ends up trying to re-parent the SCSI
device from the enclosure to the SAS PHY even after we remove the SAS Phy?.
Doesn't this need to be taken care in enclosure_removal?

-----Original Message-----
From: Calvin Owens [mailto:calvinowens@fb.com]
Sent: Monday, May 16, 2016 2:25 PM
To: Elliott, Robert (Persistent Memory)
Cc: Sathya Prakash; Chaitra P B; Suganath Prabu Subramani; James E.J.
Bottomley; Martin K. Petersen; MPT-FusionLinux.pdl@broadcom.com;
linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org;
kernel-team@fb.com; calvinowens@fb.com
Subject: Re: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY
objects

On Friday 05/13 at 21:17 +0000, Elliott, Robert (Persistent Memory) wrote:
>
>
> > -----Original Message-----
> > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> > owner@vger.kernel.org] On Behalf Of Calvin Owens
> > Sent: Friday, May 13, 2016 3:28 PM
> ...
> > Subject: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS
> > PHY objects
> >
> ...
> > The issue is that enclosure_remove_device() expects to be able to
> > re-add the device that was previously enclosured: so with SES
> > running, the order we unwind things matters in a way it otherwise
> > wouldn't.
> >
> > Since mpt3sas deletes the SAS objects before the SCSI hosts,
> > enclosure ends up trying to re-parent the SCSI device from the
> > enclosure to the SAS PHY which has already been deleted. This obviously
> > ends in sadness.
> >
> > The fix appears to be simple: just call scsi_remove_host() before we
> > call
> > sas_port_delete() and/or sas_remove_host().
> >
> ...
> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > @@ -8149,6 +8149,8 @@ void scsih_remove(struct pci_dev *pdev)
> >  		_scsih_raid_device_remove(ioc, raid_device);
> >  	}
> >
> > +	scsi_remove_host(shost);
> > +
> >  	/* free ports attached to the sas_host */
> >  	list_for_each_entry_safe(mpt3sas_port, next_port,
> >  	   &ioc->sas_hba.sas_port_list, port_list) { @@ -8172,7 +8174,6 @@
> > void scsih_remove(struct pci_dev *pdev)
> >  	}
> >
> >  	sas_remove_host(shost);
> > -	scsi_remove_host(shost);
> >  	mpt3sas_base_detach(ioc);
> >  	spin_lock(&gioc_lock);
> >  	list_del(&ioc->list);
>
> That change matches the pattern of all other drivers calling
> sas_remove_host, except for one:
> 	drivers/message/fusion/mptsas.c
>
> That consensus means this comment in drivers/scsi/scsi_transport_sas.c
> is wrong:
>
> /**
>  * sas_remove_host  -  tear down a Scsi_Host's SAS data structures
>  * @shost:      Scsi Host that is torn down
>  *
>  * Removes all SAS PHYs and remote PHYs for a given Scsi_Host.
>  * Must be called just before scsi_remove_host for SAS HBAs.
>  */

Yes, exactly: that comment appears to be backwards, and as you say most
drivers appear to do the opposite (I looked at HPSA at least, which calls
sas_port_delete() before scsi_remove_host()).

I suppose I should send a patch to fix the comment as well? It'd be nice to
get some testing to be certain I'm not breaking somebody else's setup by
fixing mine though...

Thanks,
Calvin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects
  2016-05-16 21:51     ` Sathya Prakash Veerichetty
@ 2016-05-17  3:13       ` Calvin Owens
  2016-05-18 13:14         ` Sreekanth Reddy
  0 siblings, 1 reply; 8+ messages in thread
From: Calvin Owens @ 2016-05-17  3:13 UTC (permalink / raw)
  To: Sathya Prakash Veerichetty
  Cc: Elliott, Robert (Persistent Memory),
	Chaitra Basappa, Suganath Prabu Subramani, James E.J. Bottomley,
	Martin K. Petersen, PDL-MPT-FUSIONLINUX, linux-scsi,
	linux-kernel, kernel-team

On Monday 05/16 at 15:51 -0600, Sathya Prakash Veerichetty wrote:
> Our understanding is the relationship between the SCSI host and SAS end
> devices is a parent-child and before ripping of SCSI host we need to rip of
> all the children.

I know at least HPSA does the opposite, Elliott observes that other
drivers do as well.

> Why the enclosure ends up trying to re-parent the SCSI device from the
> enclosure to the SAS PHY even after we remove the SAS Phy?

Take this path in /sys:

/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host6/port-6:0/expander-6:0/port-6:0:0/end_device-6:0:0/target6:0:0/6:0:0:0/scsi_device/6:0:0:0

The problem is "port-6:0:0" in the hierarchy: since mpt3sas explicitly
calls sas_port_delete() before scsi_remove_host(), that node is removed
from the device hierarchy and seems to orphan what lies beneath it:

[  955.119328] kobject: '6:0:0:0' (ffff88074e2e1008): kobject_uevent_env
[  955.133890] kobject: '6:0:0:0' (ffff88074e2e1008): fill_kobj_path: path = '/end_device-6:0:0/target6:0:0/6:0:0:0/bsg/6:0:0:0'
<WARN/OOPS sadness follows>

As the above from CONFIG_DEBUG_KOBJECT suggests, "end_device-6:0:0" no longer
has a parent. The path to the enclosured device is:

/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host6/port-6:0/expander-6:0/port-6:0:15/end_device-6:0:15/target6:0:15/6:0:15:0/enclosure/6:0:15:0/ArrayDevice09

Enclosure wants to re-add the device outside the enclosure:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/misc/enclosure.c#n404

We melt when trying to re-add "ArrayDevice09" to the "6:0:15:0" directory
above the "enclosure" directory best I can tell. This is *incredibly*
confusing because kobj_add_internal() actually clobbers the parent pointer
if it's NULL:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/lib/kobject.c#n216

So even though it appears that the parent was "6:0:15:0" here, it was
actually NULL when passed into kobj_add_internal(). Thus, we try to do
the add with the kset as our parent, and we get -ENOENT because that
node has been marked as inactive in kernfs as per here:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/kernfs/dir.c#n743

Why is it inactive? I think, because __kernfs_remove() deleted an ancestor:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/kernfs/dir.c#n1243

...which brings us full circle.

> Doesn't this need to be taken care in enclosure_removal?

Even with enclosure compiled out, I still get a WARN like the following
for every block device being removed at rmmod (-dirty here is just from me
applying a fix to SES I sent to the list a few days ago):

[  224.057286] ------------[ cut here ]------------
[  224.067696] WARNING: CPU: 1 PID: 5412 at fs/sysfs/group.c:237 sysfs_remove_group+0x10f/0x150
[  224.088007] sysfs group ffffffff83109ba0 not found for kobject 'sdc'
[  224.102283] Modules linked in: mpt3sas(-) <snip>
[  224.200483] CPU: 0 PID: 5412 Comm: rmmod Not tainted 4.6.0-dirty #4
[  224.219797] Hardware name: Wiwynn   HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016
[  224.237001]  ffffffff82af6280 ffff8806e826f8b0 ffffffff81e13e54 ffff8806e826f928
[  224.253811]  0000000000000000 ffff8806e826f8f8 ffffffff81127822 ffffffff831a5968
[  224.270647]  ffff8806000000ed ffffed00dd04df21 0000000000000000 ffff880754ae3078
[  224.287462] Call Trace:
[  224.292953]  [<ffffffff81e13e54>] dump_stack+0x68/0x94
[  224.304433]  [<ffffffff81127822>] __warn+0x172/0x1b0
[  224.315514]  [<ffffffff811278f7>] warn_slowpath_fmt+0x97/0xb0
[  224.351845]  [<ffffffff816bbc1f>] sysfs_remove_group+0x10f/0x150
[  224.365252]  [<ffffffff81377814>] blk_trace_remove_sysfs+0x14/0x20
[  224.379021]  [<ffffffff81d966bf>] blk_unregister_queue+0xcf/0x120
[  224.392589]  [<ffffffff81dc8c7e>] del_gendisk+0x26e/0x600
[  224.432345]  [<ffffffff8215ea75>] sd_remove+0xf5/0x1b0
[  224.443808]  [<ffffffff820dd210>] __device_release_driver+0x160/0x3a0
[  224.458151]  [<ffffffff820dd475>] device_release_driver+0x25/0x40
[  224.471734]  [<ffffffff820dbd81>] bus_remove_device+0x2e1/0x4b0
[  224.484926]  [<ffffffff820d25ad>] device_del+0x37d/0x680
[  224.555423]  [<ffffffff82148225>] __scsi_remove_device+0x1e5/0x250
[  224.569196]  [<ffffffff821443bc>] scsi_forget_host+0x12c/0x1e0
[  224.582197]  [<ffffffff82120b6c>] scsi_remove_host+0x10c/0x300
[  224.595227]  [<ffffffffa0066cc9>] scsih_remove+0x339/0x670 [mpt3sas]
[  224.609383]  [<ffffffff81edd400>] pci_device_remove+0x70/0x110
[  224.622379]  [<ffffffff820dd210>] __device_release_driver+0x160/0x3a0
[  224.636724]  [<ffffffff820de7b3>] driver_detach+0x183/0x200
[  224.649166]  [<ffffffff820dc61f>] bus_remove_driver+0xdf/0x200
[  224.662185]  [<ffffffff820df2e7>] driver_unregister+0x67/0xa0
[  224.675009]  [<ffffffff81edb2ee>] pci_unregister_driver+0x1e/0xe0
[  224.688613]  [<ffffffffa0093a3a>] _mpt3sas_exit+0x23/0x5e9 [mpt3sas]
[  224.702782]  [<ffffffff812a2d1b>] SyS_delete_module+0x2eb/0x390
[  224.743359]  [<ffffffff829529e5>] entry_SYSCALL_64_fastpath+0x18/0xa8
[  224.759036] ---[ end trace 13f1919bf2ec7bb0 ]---

My patch fixes that too :)

The only driver besides mpt*sas that calls sas_delete_port() explicitly is
HPSA, and it does it in the opposite order mpt3sas does: scsi_remove_host()
first.

Thanks,
Calvin

> -----Original Message-----
> From: Calvin Owens [mailto:calvinowens@fb.com]
> Sent: Monday, May 16, 2016 2:25 PM
> To: Elliott, Robert (Persistent Memory)
> Cc: Sathya Prakash; Chaitra P B; Suganath Prabu Subramani; James E.J.
> Bottomley; Martin K. Petersen; MPT-FusionLinux.pdl@broadcom.com;
> linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org;
> kernel-team@fb.com; calvinowens@fb.com
> Subject: Re: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY
> objects
> 
> On Friday 05/13 at 21:17 +0000, Elliott, Robert (Persistent Memory) wrote:
> >
> >
> > > -----Original Message-----
> > > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> > > owner@vger.kernel.org] On Behalf Of Calvin Owens
> > > Sent: Friday, May 13, 2016 3:28 PM
> > ...
> > > Subject: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS
> > > PHY objects
> > >
> > ...
> > > The issue is that enclosure_remove_device() expects to be able to
> > > re-add the device that was previously enclosured: so with SES
> > > running, the order we unwind things matters in a way it otherwise
> > > wouldn't.
> > >
> > > Since mpt3sas deletes the SAS objects before the SCSI hosts,
> > > enclosure ends up trying to re-parent the SCSI device from the
> > > enclosure to the SAS PHY which has already been deleted. This obviously
> > > ends in sadness.
> > >
> > > The fix appears to be simple: just call scsi_remove_host() before we
> > > call
> > > sas_port_delete() and/or sas_remove_host().
> > >
> > ...
> > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > @@ -8149,6 +8149,8 @@ void scsih_remove(struct pci_dev *pdev)
> > >  		_scsih_raid_device_remove(ioc, raid_device);
> > >  	}
> > >
> > > +	scsi_remove_host(shost);
> > > +
> > >  	/* free ports attached to the sas_host */
> > >  	list_for_each_entry_safe(mpt3sas_port, next_port,
> > >  	   &ioc->sas_hba.sas_port_list, port_list) { @@ -8172,7 +8174,6 @@
> > > void scsih_remove(struct pci_dev *pdev)
> > >  	}
> > >
> > >  	sas_remove_host(shost);
> > > -	scsi_remove_host(shost);
> > >  	mpt3sas_base_detach(ioc);
> > >  	spin_lock(&gioc_lock);
> > >  	list_del(&ioc->list);
> >
> > That change matches the pattern of all other drivers calling
> > sas_remove_host, except for one:
> > 	drivers/message/fusion/mptsas.c
> >
> > That consensus means this comment in drivers/scsi/scsi_transport_sas.c
> > is wrong:
> >
> > /**
> >  * sas_remove_host  -  tear down a Scsi_Host's SAS data structures
> >  * @shost:      Scsi Host that is torn down
> >  *
> >  * Removes all SAS PHYs and remote PHYs for a given Scsi_Host.
> >  * Must be called just before scsi_remove_host for SAS HBAs.
> >  */
> 
> Yes, exactly: that comment appears to be backwards, and as you say most
> drivers appear to do the opposite (I looked at HPSA at least, which calls
> sas_port_delete() before scsi_remove_host()).
> 
> I suppose I should send a patch to fix the comment as well? It'd be nice to
> get some testing to be certain I'm not breaking somebody else's setup by
> fixing mine though...
> 
> Thanks,
> Calvin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects
  2016-05-17  3:13       ` Calvin Owens
@ 2016-05-18 13:14         ` Sreekanth Reddy
  2016-05-18 17:49           ` Calvin Owens
  0 siblings, 1 reply; 8+ messages in thread
From: Sreekanth Reddy @ 2016-05-18 13:14 UTC (permalink / raw)
  To: Calvin Owens
  Cc: Sathya Prakash Veerichetty, Elliott, Robert (Persistent Memory),
	Chaitra Basappa, Suganath Prabu Subramani, James E.J. Bottomley,
	Martin K. Petersen, PDL-MPT-FUSIONLINUX, linux-scsi,
	linux-kernel, kernel-team

On Tue, May 17, 2016 at 8:43 AM, Calvin Owens <calvinowens@fb.com> wrote:
> On Monday 05/16 at 15:51 -0600, Sathya Prakash Veerichetty wrote:
>> Our understanding is the relationship between the SCSI host and SAS end
>> devices is a parent-child and before ripping of SCSI host we need to rip of
>> all the children.
>
> I know at least HPSA does the opposite, Elliott observes that other
> drivers do as well.
>
>> Why the enclosure ends up trying to re-parent the SCSI device from the
>> enclosure to the SAS PHY even after we remove the SAS Phy?
>
> Take this path in /sys:
>
> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host6/port-6:0/expander-6:0/port-6:0:0/end_device-6:0:0/target6:0:0/6:0:0:0/scsi_device/6:0:0:0
>
> The problem is "port-6:0:0" in the hierarchy: since mpt3sas explicitly
> calls sas_port_delete() before scsi_remove_host(), that node is removed
> from the device hierarchy and seems to orphan what lies beneath it:
>
> [  955.119328] kobject: '6:0:0:0' (ffff88074e2e1008): kobject_uevent_env
> [  955.133890] kobject: '6:0:0:0' (ffff88074e2e1008): fill_kobj_path: path = '/end_device-6:0:0/target6:0:0/6:0:0:0/bsg/6:0:0:0'
> <WARN/OOPS sadness follows>

[Sreekanth] Calvin, Now with your patch we are observing same type of
<WARN/OOPS> while unregistered the drive with scsi_transport layer
after calling scsi_remove_host() on the same driver unload path,

kobject: 'end_device-2:0' (ffff8800dad0c010): kobject_uevent_env
kobject: 'end_device-2:0' (ffff8800dad0c010): fill_kobj_path: path =
'/host2/port-2:0/end_device-2:0/bsg/end_device-2:0'

As scsi_remove_host() has freed it's parent
"/devices/pci0000:00/0000:00:07.0" we are observing these warnings.

here is it's complete path

 fill_kobj_path: path =
'/devices/pci0000:00/0000:00:07.0/host2/port-2:0/end_device-2:0/bsg/end_device-2:0',

Until on  ~3.19/4.0 kernel all the thing are working fine, we haven't
seen these types of kernel warning. Also we haven't done any changes
in mpt3sas driver w.r.t to this after 4.0 kernel.

>
> As the above from CONFIG_DEBUG_KOBJECT suggests, "end_device-6:0:0" no longer
> has a parent. The path to the enclosured device is:
>
> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host6/port-6:0/expander-6:0/port-6:0:15/end_device-6:0:15/target6:0:15/6:0:15:0/enclosure/6:0:15:0/ArrayDevice09
>
> Enclosure wants to re-add the device outside the enclosure:
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/misc/enclosure.c#n404
>
> We melt when trying to re-add "ArrayDevice09" to the "6:0:15:0" directory
> above the "enclosure" directory best I can tell. This is *incredibly*
> confusing because kobj_add_internal() actually clobbers the parent pointer
> if it's NULL:
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/lib/kobject.c#n216
>
> So even though it appears that the parent was "6:0:15:0" here, it was
> actually NULL when passed into kobj_add_internal(). Thus, we try to do
> the add with the kset as our parent, and we get -ENOENT because that
> node has been marked as inactive in kernfs as per here:
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/kernfs/dir.c#n743
>
> Why is it inactive? I think, because __kernfs_remove() deleted an ancestor:
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/kernfs/dir.c#n1243
>
> ...which brings us full circle.
>
>> Doesn't this need to be taken care in enclosure_removal?
>
> Even with enclosure compiled out, I still get a WARN like the following
> for every block device being removed at rmmod (-dirty here is just from me
> applying a fix to SES I sent to the list a few days ago):
>
> [  224.057286] ------------[ cut here ]------------
> [  224.067696] WARNING: CPU: 1 PID: 5412 at fs/sysfs/group.c:237 sysfs_remove_group+0x10f/0x150
> [  224.088007] sysfs group ffffffff83109ba0 not found for kobject 'sdc'
> [  224.102283] Modules linked in: mpt3sas(-) <snip>
> [  224.200483] CPU: 0 PID: 5412 Comm: rmmod Not tainted 4.6.0-dirty #4
> [  224.219797] Hardware name: Wiwynn   HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016
> [  224.237001]  ffffffff82af6280 ffff8806e826f8b0 ffffffff81e13e54 ffff8806e826f928
> [  224.253811]  0000000000000000 ffff8806e826f8f8 ffffffff81127822 ffffffff831a5968
> [  224.270647]  ffff8806000000ed ffffed00dd04df21 0000000000000000 ffff880754ae3078
> [  224.287462] Call Trace:
> [  224.292953]  [<ffffffff81e13e54>] dump_stack+0x68/0x94
> [  224.304433]  [<ffffffff81127822>] __warn+0x172/0x1b0
> [  224.315514]  [<ffffffff811278f7>] warn_slowpath_fmt+0x97/0xb0
> [  224.351845]  [<ffffffff816bbc1f>] sysfs_remove_group+0x10f/0x150
> [  224.365252]  [<ffffffff81377814>] blk_trace_remove_sysfs+0x14/0x20
> [  224.379021]  [<ffffffff81d966bf>] blk_unregister_queue+0xcf/0x120
> [  224.392589]  [<ffffffff81dc8c7e>] del_gendisk+0x26e/0x600
> [  224.432345]  [<ffffffff8215ea75>] sd_remove+0xf5/0x1b0
> [  224.443808]  [<ffffffff820dd210>] __device_release_driver+0x160/0x3a0
> [  224.458151]  [<ffffffff820dd475>] device_release_driver+0x25/0x40
> [  224.471734]  [<ffffffff820dbd81>] bus_remove_device+0x2e1/0x4b0
> [  224.484926]  [<ffffffff820d25ad>] device_del+0x37d/0x680
> [  224.555423]  [<ffffffff82148225>] __scsi_remove_device+0x1e5/0x250
> [  224.569196]  [<ffffffff821443bc>] scsi_forget_host+0x12c/0x1e0
> [  224.582197]  [<ffffffff82120b6c>] scsi_remove_host+0x10c/0x300
> [  224.595227]  [<ffffffffa0066cc9>] scsih_remove+0x339/0x670 [mpt3sas]
> [  224.609383]  [<ffffffff81edd400>] pci_device_remove+0x70/0x110
> [  224.622379]  [<ffffffff820dd210>] __device_release_driver+0x160/0x3a0
> [  224.636724]  [<ffffffff820de7b3>] driver_detach+0x183/0x200
> [  224.649166]  [<ffffffff820dc61f>] bus_remove_driver+0xdf/0x200
> [  224.662185]  [<ffffffff820df2e7>] driver_unregister+0x67/0xa0
> [  224.675009]  [<ffffffff81edb2ee>] pci_unregister_driver+0x1e/0xe0
> [  224.688613]  [<ffffffffa0093a3a>] _mpt3sas_exit+0x23/0x5e9 [mpt3sas]
> [  224.702782]  [<ffffffff812a2d1b>] SyS_delete_module+0x2eb/0x390
> [  224.743359]  [<ffffffff829529e5>] entry_SYSCALL_64_fastpath+0x18/0xa8
> [  224.759036] ---[ end trace 13f1919bf2ec7bb0 ]---
>
> My patch fixes that too :)
>
> The only driver besides mpt*sas that calls sas_delete_port() explicitly is
> HPSA, and it does it in the opposite order mpt3sas does: scsi_remove_host()
> first.
>
> Thanks,
> Calvin
>
>> -----Original Message-----
>> From: Calvin Owens [mailto:calvinowens@fb.com]
>> Sent: Monday, May 16, 2016 2:25 PM
>> To: Elliott, Robert (Persistent Memory)
>> Cc: Sathya Prakash; Chaitra P B; Suganath Prabu Subramani; James E.J.
>> Bottomley; Martin K. Petersen; MPT-FusionLinux.pdl@broadcom.com;
>> linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org;
>> kernel-team@fb.com; calvinowens@fb.com
>> Subject: Re: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY
>> objects
>>
>> On Friday 05/13 at 21:17 +0000, Elliott, Robert (Persistent Memory) wrote:
>> >
>> >
>> > > -----Original Message-----
>> > > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
>> > > owner@vger.kernel.org] On Behalf Of Calvin Owens
>> > > Sent: Friday, May 13, 2016 3:28 PM
>> > ...
>> > > Subject: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS
>> > > PHY objects
>> > >
>> > ...
>> > > The issue is that enclosure_remove_device() expects to be able to
>> > > re-add the device that was previously enclosured: so with SES
>> > > running, the order we unwind things matters in a way it otherwise
>> > > wouldn't.
>> > >
>> > > Since mpt3sas deletes the SAS objects before the SCSI hosts,
>> > > enclosure ends up trying to re-parent the SCSI device from the
>> > > enclosure to the SAS PHY which has already been deleted. This obviously
>> > > ends in sadness.
>> > >
>> > > The fix appears to be simple: just call scsi_remove_host() before we
>> > > call
>> > > sas_port_delete() and/or sas_remove_host().
>> > >
>> > ...
>> > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
>> > > @@ -8149,6 +8149,8 @@ void scsih_remove(struct pci_dev *pdev)
>> > >           _scsih_raid_device_remove(ioc, raid_device);
>> > >   }
>> > >
>> > > + scsi_remove_host(shost);
>> > > +
>> > >   /* free ports attached to the sas_host */
>> > >   list_for_each_entry_safe(mpt3sas_port, next_port,
>> > >      &ioc->sas_hba.sas_port_list, port_list) { @@ -8172,7 +8174,6 @@
>> > > void scsih_remove(struct pci_dev *pdev)
>> > >   }
>> > >
>> > >   sas_remove_host(shost);
>> > > - scsi_remove_host(shost);
>> > >   mpt3sas_base_detach(ioc);
>> > >   spin_lock(&gioc_lock);
>> > >   list_del(&ioc->list);
>> >
>> > That change matches the pattern of all other drivers calling
>> > sas_remove_host, except for one:
>> >     drivers/message/fusion/mptsas.c
>> >
>> > That consensus means this comment in drivers/scsi/scsi_transport_sas.c
>> > is wrong:
>> >
>> > /**
>> >  * sas_remove_host  -  tear down a Scsi_Host's SAS data structures
>> >  * @shost:      Scsi Host that is torn down
>> >  *
>> >  * Removes all SAS PHYs and remote PHYs for a given Scsi_Host.
>> >  * Must be called just before scsi_remove_host for SAS HBAs.
>> >  */
>>
>> Yes, exactly: that comment appears to be backwards, and as you say most
>> drivers appear to do the opposite (I looked at HPSA at least, which calls
>> sas_port_delete() before scsi_remove_host()).
>>
>> I suppose I should send a patch to fix the comment as well? It'd be nice to
>> get some testing to be certain I'm not breaking somebody else's setup by
>> fixing mine though...
>>
>> Thanks,
>> Calvin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects
  2016-05-18 13:14         ` Sreekanth Reddy
@ 2016-05-18 17:49           ` Calvin Owens
  2016-05-19  4:06             ` Sreekanth Reddy
  0 siblings, 1 reply; 8+ messages in thread
From: Calvin Owens @ 2016-05-18 17:49 UTC (permalink / raw)
  To: Sreekanth Reddy
  Cc: Sathya Prakash Veerichetty, Elliott, Robert (Persistent Memory),
	Chaitra Basappa, Suganath Prabu Subramani, James E.J. Bottomley,
	Martin K. Petersen, PDL-MPT-FUSIONLINUX, linux-scsi,
	linux-kernel, kernel-team, calvinowens

On Wednesday 05/18 at 18:44 +0530, Sreekanth Reddy wrote:
> On Tue, May 17, 2016 at 8:43 AM, Calvin Owens <calvinowens@fb.com> wrote:
> > On Monday 05/16 at 15:51 -0600, Sathya Prakash Veerichetty wrote:
> >> Our understanding is the relationship between the SCSI host and SAS end
> >> devices is a parent-child and before ripping of SCSI host we need to rip of
> >> all the children.
> >
> > I know at least HPSA does the opposite, Elliott observes that other
> > drivers do as well.
> >
> >> Why the enclosure ends up trying to re-parent the SCSI device from the
> >> enclosure to the SAS PHY even after we remove the SAS Phy?
> >
> > Take this path in /sys:
> >
> > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host6/port-6:0/expander-6:0/port-6:0:0/end_device-6:0:0/target6:0:0/6:0:0:0/scsi_device/6:0:0:0
> >
> > The problem is "port-6:0:0" in the hierarchy: since mpt3sas explicitly
> > calls sas_port_delete() before scsi_remove_host(), that node is removed
> > from the device hierarchy and seems to orphan what lies beneath it:
> >
> > [  955.119328] kobject: '6:0:0:0' (ffff88074e2e1008): kobject_uevent_env
> > [  955.133890] kobject: '6:0:0:0' (ffff88074e2e1008): fill_kobj_path: path = '/end_device-6:0:0/target6:0:0/6:0:0:0/bsg/6:0:0:0'
> > <WARN/OOPS sadness follows>
> 
> [Sreekanth] Calvin, Now with your patch we are observing same type of
> <WARN/OOPS> while unregistered the drive with scsi_transport layer
> after calling scsi_remove_host() on the same driver unload path,
> 
> kobject: 'end_device-2:0' (ffff8800dad0c010): kobject_uevent_env
> kobject: 'end_device-2:0' (ffff8800dad0c010): fill_kobj_path: path =
> '/host2/port-2:0/end_device-2:0/bsg/end_device-2:0'
> 
> As scsi_remove_host() has freed it's parent
> "/devices/pci0000:00/0000:00:07.0" we are observing these warnings.
> 
> here is it's complete path
> 
>  fill_kobj_path: path =
> '/devices/pci0000:00/0000:00:07.0/host2/port-2:0/end_device-2:0/bsg/end_device-2:0',

Could you post the stacktraces for your WARN/OOPS?

> Until on  ~3.19/4.0 kernel all the thing are working fine, we haven't
> seen these types of kernel warning. Also we haven't done any changes
> in mpt3sas driver w.r.t to this after 4.0 kernel.

I'll dig around more. In any case, my patch seems to be papering over
the problem rather than solving it, so we need a different fix.

Thanks,
Calvin

> >
> > As the above from CONFIG_DEBUG_KOBJECT suggests, "end_device-6:0:0" no longer
> > has a parent. The path to the enclosured device is:
> >
> > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host6/port-6:0/expander-6:0/port-6:0:15/end_device-6:0:15/target6:0:15/6:0:15:0/enclosure/6:0:15:0/ArrayDevice09
> >
> > Enclosure wants to re-add the device outside the enclosure:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/misc/enclosure.c#n404
> >
> > We melt when trying to re-add "ArrayDevice09" to the "6:0:15:0" directory
> > above the "enclosure" directory best I can tell. This is *incredibly*
> > confusing because kobj_add_internal() actually clobbers the parent pointer
> > if it's NULL:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/lib/kobject.c#n216
> >
> > So even though it appears that the parent was "6:0:15:0" here, it was
> > actually NULL when passed into kobj_add_internal(). Thus, we try to do
> > the add with the kset as our parent, and we get -ENOENT because that
> > node has been marked as inactive in kernfs as per here:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/kernfs/dir.c#n743
> >
> > Why is it inactive? I think, because __kernfs_remove() deleted an ancestor:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/kernfs/dir.c#n1243
> >
> > ...which brings us full circle.
> >
> >> Doesn't this need to be taken care in enclosure_removal?
> >
> > Even with enclosure compiled out, I still get a WARN like the following
> > for every block device being removed at rmmod (-dirty here is just from me
> > applying a fix to SES I sent to the list a few days ago):
> >
> > [  224.057286] ------------[ cut here ]------------
> > [  224.067696] WARNING: CPU: 1 PID: 5412 at fs/sysfs/group.c:237 sysfs_remove_group+0x10f/0x150
> > [  224.088007] sysfs group ffffffff83109ba0 not found for kobject 'sdc'
> > [  224.102283] Modules linked in: mpt3sas(-) <snip>
> > [  224.200483] CPU: 0 PID: 5412 Comm: rmmod Not tainted 4.6.0-dirty #4
> > [  224.219797] Hardware name: Wiwynn   HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016
> > [  224.237001]  ffffffff82af6280 ffff8806e826f8b0 ffffffff81e13e54 ffff8806e826f928
> > [  224.253811]  0000000000000000 ffff8806e826f8f8 ffffffff81127822 ffffffff831a5968
> > [  224.270647]  ffff8806000000ed ffffed00dd04df21 0000000000000000 ffff880754ae3078
> > [  224.287462] Call Trace:
> > [  224.292953]  [<ffffffff81e13e54>] dump_stack+0x68/0x94
> > [  224.304433]  [<ffffffff81127822>] __warn+0x172/0x1b0
> > [  224.315514]  [<ffffffff811278f7>] warn_slowpath_fmt+0x97/0xb0
> > [  224.351845]  [<ffffffff816bbc1f>] sysfs_remove_group+0x10f/0x150
> > [  224.365252]  [<ffffffff81377814>] blk_trace_remove_sysfs+0x14/0x20
> > [  224.379021]  [<ffffffff81d966bf>] blk_unregister_queue+0xcf/0x120
> > [  224.392589]  [<ffffffff81dc8c7e>] del_gendisk+0x26e/0x600
> > [  224.432345]  [<ffffffff8215ea75>] sd_remove+0xf5/0x1b0
> > [  224.443808]  [<ffffffff820dd210>] __device_release_driver+0x160/0x3a0
> > [  224.458151]  [<ffffffff820dd475>] device_release_driver+0x25/0x40
> > [  224.471734]  [<ffffffff820dbd81>] bus_remove_device+0x2e1/0x4b0
> > [  224.484926]  [<ffffffff820d25ad>] device_del+0x37d/0x680
> > [  224.555423]  [<ffffffff82148225>] __scsi_remove_device+0x1e5/0x250
> > [  224.569196]  [<ffffffff821443bc>] scsi_forget_host+0x12c/0x1e0
> > [  224.582197]  [<ffffffff82120b6c>] scsi_remove_host+0x10c/0x300
> > [  224.595227]  [<ffffffffa0066cc9>] scsih_remove+0x339/0x670 [mpt3sas]
> > [  224.609383]  [<ffffffff81edd400>] pci_device_remove+0x70/0x110
> > [  224.622379]  [<ffffffff820dd210>] __device_release_driver+0x160/0x3a0
> > [  224.636724]  [<ffffffff820de7b3>] driver_detach+0x183/0x200
> > [  224.649166]  [<ffffffff820dc61f>] bus_remove_driver+0xdf/0x200
> > [  224.662185]  [<ffffffff820df2e7>] driver_unregister+0x67/0xa0
> > [  224.675009]  [<ffffffff81edb2ee>] pci_unregister_driver+0x1e/0xe0
> > [  224.688613]  [<ffffffffa0093a3a>] _mpt3sas_exit+0x23/0x5e9 [mpt3sas]
> > [  224.702782]  [<ffffffff812a2d1b>] SyS_delete_module+0x2eb/0x390
> > [  224.743359]  [<ffffffff829529e5>] entry_SYSCALL_64_fastpath+0x18/0xa8
> > [  224.759036] ---[ end trace 13f1919bf2ec7bb0 ]---
> >
> > My patch fixes that too :)
> >
> > The only driver besides mpt*sas that calls sas_delete_port() explicitly is
> > HPSA, and it does it in the opposite order mpt3sas does: scsi_remove_host()
> > first.
> >
> > Thanks,
> > Calvin
> >
> >> -----Original Message-----
> >> From: Calvin Owens [mailto:calvinowens@fb.com]
> >> Sent: Monday, May 16, 2016 2:25 PM
> >> To: Elliott, Robert (Persistent Memory)
> >> Cc: Sathya Prakash; Chaitra P B; Suganath Prabu Subramani; James E.J.
> >> Bottomley; Martin K. Petersen; MPT-FusionLinux.pdl@broadcom.com;
> >> linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org;
> >> kernel-team@fb.com; calvinowens@fb.com
> >> Subject: Re: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY
> >> objects
> >>
> >> On Friday 05/13 at 21:17 +0000, Elliott, Robert (Persistent Memory) wrote:
> >> >
> >> >
> >> > > -----Original Message-----
> >> > > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> >> > > owner@vger.kernel.org] On Behalf Of Calvin Owens
> >> > > Sent: Friday, May 13, 2016 3:28 PM
> >> > ...
> >> > > Subject: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS
> >> > > PHY objects
> >> > >
> >> > ...
> >> > > The issue is that enclosure_remove_device() expects to be able to
> >> > > re-add the device that was previously enclosured: so with SES
> >> > > running, the order we unwind things matters in a way it otherwise
> >> > > wouldn't.
> >> > >
> >> > > Since mpt3sas deletes the SAS objects before the SCSI hosts,
> >> > > enclosure ends up trying to re-parent the SCSI device from the
> >> > > enclosure to the SAS PHY which has already been deleted. This obviously
> >> > > ends in sadness.
> >> > >
> >> > > The fix appears to be simple: just call scsi_remove_host() before we
> >> > > call
> >> > > sas_port_delete() and/or sas_remove_host().
> >> > >
> >> > ...
> >> > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> >> > > @@ -8149,6 +8149,8 @@ void scsih_remove(struct pci_dev *pdev)
> >> > >           _scsih_raid_device_remove(ioc, raid_device);
> >> > >   }
> >> > >
> >> > > + scsi_remove_host(shost);
> >> > > +
> >> > >   /* free ports attached to the sas_host */
> >> > >   list_for_each_entry_safe(mpt3sas_port, next_port,
> >> > >      &ioc->sas_hba.sas_port_list, port_list) { @@ -8172,7 +8174,6 @@
> >> > > void scsih_remove(struct pci_dev *pdev)
> >> > >   }
> >> > >
> >> > >   sas_remove_host(shost);
> >> > > - scsi_remove_host(shost);
> >> > >   mpt3sas_base_detach(ioc);
> >> > >   spin_lock(&gioc_lock);
> >> > >   list_del(&ioc->list);
> >> >
> >> > That change matches the pattern of all other drivers calling
> >> > sas_remove_host, except for one:
> >> >     drivers/message/fusion/mptsas.c
> >> >
> >> > That consensus means this comment in drivers/scsi/scsi_transport_sas.c
> >> > is wrong:
> >> >
> >> > /**
> >> >  * sas_remove_host  -  tear down a Scsi_Host's SAS data structures
> >> >  * @shost:      Scsi Host that is torn down
> >> >  *
> >> >  * Removes all SAS PHYs and remote PHYs for a given Scsi_Host.
> >> >  * Must be called just before scsi_remove_host for SAS HBAs.
> >> >  */
> >>
> >> Yes, exactly: that comment appears to be backwards, and as you say most
> >> drivers appear to do the opposite (I looked at HPSA at least, which calls
> >> sas_port_delete() before scsi_remove_host()).
> >>
> >> I suppose I should send a patch to fix the comment as well? It'd be nice to
> >> get some testing to be certain I'm not breaking somebody else's setup by
> >> fixing mine though...
> >>
> >> Thanks,
> >> Calvin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects
  2016-05-18 17:49           ` Calvin Owens
@ 2016-05-19  4:06             ` Sreekanth Reddy
  0 siblings, 0 replies; 8+ messages in thread
From: Sreekanth Reddy @ 2016-05-19  4:06 UTC (permalink / raw)
  To: Calvin Owens
  Cc: Sathya Prakash Veerichetty, Elliott, Robert (Persistent Memory),
	Chaitra Basappa, Suganath Prabu Subramani, James E.J. Bottomley,
	Martin K. Petersen, PDL-MPT-FUSIONLINUX, linux-scsi,
	linux-kernel, kernel-team

Hi Calvin,

Here is the log w.r.t target drive 2:0:0:0 with your changes,

mpt3sas version 12.100.00.00 unloading
kobject: '2:0:0:0' (ffff8800d50f1010): kobject_uevent_env
kobject: '2:0:0:0' (ffff8800d50f1010): fill_kobj_path: path =
'/devices/pci0000:00/0000:00:07.0/host2/port-2:0/end_device-2:0/target2:0:0/2:0:0:0/bsg/2:0:0:0'
kobject: 'bsg' (ffff88011a011780): kobject_cleanup, parent ffff8800d58611a8
kobject: 'bsg' (ffff88011a011780): auto cleanup kobject_del
kobject: 'bsg' (ffff88011a011780): calling ktype release
kobject: 'bsg': free name
kobject: '2:0:0:0' (ffff8800d50f1010): kobject_cleanup, parent           (null)
kobject: '2:0:0:0' (ffff8800d50f1010): calling ktype release
kobject: '2:0:0:0': free name
kobject: 'sg0' (ffff8800d50f1810): kobject_uevent_env
kobject: 'sg0' (ffff8800d50f1810): fill_kobj_path: path =
'/devices/pci0000:00/0000:00:07.0/host2/port-2:0/end_device-2:0/target2:0:0/2:0:0:0/scsi_generic/sg0'
kobject: 'scsi_generic' (ffff88011a011e80): kobject_cleanup, parent
ffff8800d58611a8
kobject: 'scsi_generic' (ffff88011a011e80): auto cleanup kobject_del
kobject: 'scsi_generic' (ffff88011a011e80): calling ktype release
kobject: 'scsi_generic': free name
kobject: 'sg0' (ffff8800d50f1810): kobject_cleanup, parent           (null)
kobject: 'sg0' (ffff8800d50f1810): calling ktype release
kobject: 'sg0': free name
kobject: '(null)' (ffff88011a5af800): kobject_cleanup, parent           (null)
kobject: '(null)' (ffff88011a5af800): calling ktype release
kobject: '(null)' (ffff8800da2a2880): kobject_cleanup, parent           (null)
kobject: '(null)' (ffff8800da2a2880): calling ktype release
kobject: '2:0:0:0' (ffff8800d5861478): kobject_uevent_env
kobject: '2:0:0:0' (ffff8800d5861478): fill_kobj_path: path =
'/devices/pci0000:00/0000:00:07.0/host2/port-2:0/end_device-2:0/target2:0:0/2:0:0:0/scsi_device/2:0:0:0'
kobject: 'scsi_device' (ffff88011a011b00): kobject_cleanup, parent
ffff8800d58611a8
kobject: 'scsi_device' (ffff88011a011b00): auto cleanup kobject_del
kobject: 'scsi_device' (ffff88011a011b00): calling ktype release
kobject: 'scsi_device': free name
kobject: '2:0:0:0' (ffff8800d5861478): kobject_cleanup, parent           (null)
kobject: '2:0:0:0' (ffff8800d5861478): calling ktype release
kobject: '2:0:0:0': free name
kobject: '2:0:0:0' (ffff8800da3de420): kobject_uevent_env
kobject: '2:0:0:0' (ffff8800da3de420): fill_kobj_path: path =
'/devices/pci0000:00/0000:00:07.0/host2/port-2:0/end_device-2:0/target2:0:0/2:0:0:0/scsi_disk/2:0:0:0'
kobject: 'scsi_disk' (ffff8800daa11180): kobject_cleanup, parent
ffff8800d58611a8
kobject: 'scsi_disk' (ffff8800daa11180): auto cleanup kobject_del
kobject: 'scsi_disk' (ffff8800daa11180): calling ktype release
kobject: 'scsi_disk': free name
kobject: 'integrity' (ffff8800d98d2c00): kobject_uevent_env
kobject: 'integrity' (ffff8800d98d2c00): kobject_uevent_env: filter
function caused the event to drop!
kobject: 'integrity' (ffff8800d98d2c00): kobject_cleanup, parent
    (null)
kobject: 'integrity' (ffff8800d98d2c00): does not have a release()
function, it is broken and must be fixed.
kobject: 'integrity': free name
kobject: 'iosched' (ffff8800d488b410): kobject_uevent_env
kobject: 'iosched' (ffff8800d488b410): kobject_uevent_env: filter
function caused the event to drop!
kobject: 'queue' (ffff8800d9691120): kobject_uevent_env
kobject: 'queue' (ffff8800d9691120): kobject_uevent_env: filter
function caused the event to drop!
kobject: 'holders' (ffff8800dae04000): kobject_cleanup, parent ffff8800d98d2880
kobject: 'holders' (ffff8800dae04000): auto cleanup kobject_del
kobject: 'holders' (ffff8800dae04000): calling ktype release
kobject: (ffff8800dae04000): dynamic_kobj_release
kobject: 'holders': free name
kobject: 'slaves' (ffff8800d9851740): kobject_cleanup, parent ffff8800d98d2880
kobject: 'slaves' (ffff8800d9851740): auto cleanup kobject_del
kobject: 'slaves' (ffff8800d9851740): calling ktype release
kobject: (ffff8800d9851740): dynamic_kobj_release
kobject: 'slaves': free name
kobject: 'sda' (ffff8800d98d2880): kobject_uevent_env
kobject: 'sda' (ffff8800d98d2880): fill_kobj_path: path =
'/devices/pci0000:00/0000:00:07.0/host2/port-2:0/end_device-2:0/target2:0:0/2:0:0:0/block/sda'
kobject: 'block' (ffff8800d9fe4c80): kobject_cleanup, parent ffff8800d58611a8
kobject: 'block' (ffff8800d9fe4c80): auto cleanup kobject_del
kobject: 'block' (ffff8800d9fe4c80): calling ktype release
kobject: 'block': free name
sd 2:0:0:0: [sda] Synchronizing SCSI cache
sd 2:0:0:0: tag#0 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
sd 2:0:0:0: tag#0 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
sd 2:0:0:0: tag#0 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
sd 2:0:0:0: [sda] Synchronize Cache(10) failed: Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
kobject: '2:0:0:0' (ffff8800da3de420): kobject_cleanup, parent           (null)
kobject: '2:0:0:0' (ffff8800da3de420): calling ktype release
kobject: 'sda' (ffff8800d98d2880): kobject_cleanup, parent           (null)
kobject: 'sda' (ffff8800d98d2880): calling ktype release
kobject: 'sda': free name
kobject: '2:0:0:0': free name
kobject: '2:0:0:0' (ffff8800d58611a8): kobject_uevent_env
kobject: '2:0:0:0' (ffff8800d58611a8): fill_kobj_path: path =
'/devices/pci0000:00/0000:00:07.0/host2/port-2:0/end_device-2:0/target2:0:0/2:0:0:0'
kobject: '8:0' (ffff8800d223d410): kobject_uevent_env
kobject: '8:0' (ffff8800d223d410): fill_kobj_path: path =
'/devices/virtual/bdi/8:0'
kobject: '8:0' (ffff8800d223d410): kobject_cleanup, parent           (null)
kobject: '8:0' (ffff8800d223d410): calling ktype release
kobject: '8:0': free name
kobject: 'target2:0:0' (ffff8800d9615838): kobject_uevent_env
kobject: 'target2:0:0' (ffff8800d9615838): fill_kobj_path: path =
'/devices/pci0000:00/0000:00:07.0/host2/port-2:0/end_device-2:0/target2:0:0'
kobject: '2:0:0:0' (ffff8800d58611a8): kobject_cleanup, parent           (null)
kobject: '2:0:0:0' (ffff8800d58611a8): calling ktype release
kobject: 'queue' (ffff8800d9691120): kobject_cleanup, parent           (null)
kobject: 'queue' (ffff8800d9691120): calling ktype release
kobject: 'iosched' (ffff8800d488b410): kobject_cleanup, parent           (null)
kobject: 'iosched' (ffff8800d488b410): calling ktype release
kobject: 'iosched': free name
kobject: 'queue': free name
kobject: 'target2:0:0' (ffff8800d9615838): kobject_cleanup, parent
      (null)
kobject: 'target2:0:0' (ffff8800d9615838): calling ktype release
kobject: 'target2:0:0': free name
kobject: '2:0:0:0': free name
kobject: 'sas_host2' (ffff8800d485d010): kobject_uevent_env
kobject: 'sas_host2' (ffff8800d485d010): fill_kobj_path: path =
'/devices/pci0000:00/0000:00:07.0/host2/bsg/sas_host2'
kobject: 'bsg' (ffff8800d5a3df00): kobject_cleanup, parent ffff8800da5f8218
kobject: 'bsg' (ffff8800d5a3df00): auto cleanup kobject_del
kobject: 'bsg' (ffff8800d5a3df00): calling ktype release
kobject: 'bsg': free name
kobject: 'sas_host2' (ffff8800d485d010): kobject_cleanup, parent
    (null)
kobject: 'sas_host2' (ffff8800d485d010): calling ktype release
kobject: 'sas_host2': free name
kobject: '(null)' (ffff8800d985a6b8): kobject_cleanup, parent           (null)
kobject: '(null)' (ffff8800d985a6b8): calling ktype release
kobject: '(null)' (ffff8800da26dc10): kobject_cleanup, parent           (null)
kobject: '(null)' (ffff8800da26dc10): calling ktype release
kobject: 'host2' (ffff88011a659838): kobject_uevent_env
kobject: 'host2' (ffff88011a659838): fill_kobj_path: path =
'/devices/pci0000:00/0000:00:07.0/host2/sas_host/host2'
kobject: 'sas_host' (ffff8800d5a3d680): kobject_cleanup, parent ffff8800da5f8218
kobject: 'sas_host' (ffff8800d5a3d680): auto cleanup kobject_del
kobject: 'sas_host' (ffff8800d5a3d680): calling ktype release
kobject: 'sas_host': free name
kobject: 'host2' (ffff88011a659838): kobject_cleanup, parent           (null)
kobject: 'host2' (ffff88011a659838): calling ktype release
kobject: 'host2': free name
kobject: 'host2' (ffff8800da5f84e8): kobject_uevent_env
kobject: 'host2' (ffff8800da5f84e8): fill_kobj_path: path =
'/devices/pci0000:00/0000:00:07.0/host2/scsi_host/host2'
kobject: 'scsi_host' (ffff8800d9c96180): kobject_cleanup, parent
ffff8800da5f8218
kobject: 'scsi_host' (ffff8800d9c96180): auto cleanup kobject_del
kobject: 'scsi_host' (ffff8800d9c96180): calling ktype release
kobject: 'scsi_host': free name
kobject: 'host2' (ffff8800da5f84e8): kobject_cleanup, parent           (null)
kobject: 'host2' (ffff8800da5f84e8): calling ktype release
kobject: 'host2': free name
kobject: 'host2' (ffff8800da5f8218): kobject_uevent_env
kobject: 'host2' (ffff8800da5f8218): fill_kobj_path: path =
'/devices/pci0000:00/0000:00:07.0/host2'
mpt3sas_cm0: _scsih_remove_device: enter: handle(0x0019),
sas_addr(0x5000c500001e1d11)
mpt3sas_cm0: _scsih_remove_device: enter: enclosure logical
id(0x500605b077777778), slot(4)
mpt3sas_cm0: _scsih_remove_device: enter: enclosure level(0x0000),
connector name(     \x01)
 port-2:0: remove: sas_addr(0x5000c500001e1d11), phy(4)
------------[ cut here ]------------
WARNING: CPU: 7 PID: 3953 at fs/sysfs/group.c:237 sysfs_remove_group+0x97/0xa0
sysfs group ffffffff81ccdac0 not found for kobject 'end_device-2:0'
Modules linked in: mpt3sas(OE-) ip6table_filter ip6_tables ebtable_nat
ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack
ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle iptable_filter
ip_tables bridge stp llc nfsd lockd grace nfs_acl auth_rpcgss autofs4
sunrpc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan
vhost tun uinput sg sd_mod scsi_transport_sas raid_class
virtio_balloon snd_hda_codec_generic pcspkr snd_hda_intel
snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm
snd_timer snd soundcore i2c_piix4 i2c_core 8139too 8139cp mii
dm_mod(E) ext4(E) mbcache(E) jbd2(E) virtio_blk(E) virtio_pci(E)
virtio_ring(E) virtio(E) pata_acpi(E) ata_generic(E) ata_piix(E) [last
unloaded: mpt3sas]
CPU: 7 PID: 3953 Comm: rmmod Tainted: G        W  OE   4.6.0+ #2
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
 0000000000000000 ffff8800d224b978 ffffffff812d93b9 ffffffff81248b57
 ffff8800d224b9d8 ffff8800d224b9d8 0000000000000000 ffff8800d224b9c8
 ffffffff8106597d ffff8800d224b9a8 000000edd9d8fe10 ffff8800d224b9b8
Call Trace:
 [<ffffffff812d93b9>] dump_stack+0x51/0x78
 [<ffffffff81248b57>] ? sysfs_remove_group+0x97/0xa0
 [<ffffffff8106597d>] __warn+0xfd/0x120
 [<ffffffff81065a59>] warn_slowpath_fmt+0x49/0x50
 [<ffffffff812448c3>] ? kernfs_find_and_get_ns+0x53/0x70
 [<ffffffff81248b57>] sysfs_remove_group+0x97/0xa0
 [<ffffffff814000aa>] dpm_sysfs_remove+0x5a/0x70
 [<ffffffff813f4189>] device_del+0x49/0x200
 [<ffffffff812daceb>] ? idr_remove+0x4b/0x110
 [<ffffffff813f4362>] device_unregister+0x22/0x60
 [<ffffffff812c6ef2>] bsg_unregister_queue+0x62/0xa0
 [<ffffffffa026b1d2>] sas_rphy_remove+0x42/0x80 [scsi_transport_sas]
 [<ffffffffa026bc96>] sas_rphy_delete+0x16/0x30 [scsi_transport_sas]
 [<ffffffffa026bcda>] sas_port_delete+0x2a/0x120 [scsi_transport_sas]
 [<ffffffffa026b130>] ? sas_port_delete_phy+0x70/0x90 [scsi_transport_sas]
 [<ffffffffa059d8e5>] mpt3sas_transport_port_remove+0x185/0x1c0 [mpt3sas]
 [<ffffffffa0592296>] _scsih_remove_device+0x136/0x330 [mpt3sas]
 [<ffffffff812dbc62>] ? kobj_kset_leave+0x52/0x60
 [<ffffffffa05987a1>] mpt3sas_device_remove_by_sas_address+0xa1/0xe0 [mpt3sas]
 [<ffffffffa0598cd0>] scsih_remove+0xe0/0x1d0 [mpt3sas]
 [<ffffffff8131f4e4>] pci_device_remove+0x44/0xd0
 [<ffffffff8140332e>] ? __pm_runtime_idle+0x8e/0x90
 [<ffffffff813f7c94>] __device_release_driver+0xb4/0x160
 [<ffffffff813f8b2e>] driver_detach+0xbe/0xc0
 [<ffffffff813f6a39>] bus_remove_driver+0x59/0xc0
 [<ffffffff813f90b0>] driver_unregister+0x30/0x60
 [<ffffffff8131f5c3>] pci_unregister_driver+0x23/0x80
 [<ffffffffa05a6b29>] _mpt3sas_exit+0x25/0x38 [mpt3sas]
 [<ffffffff810e7ba3>] SyS_delete_module+0x183/0x1d0
 [<ffffffff81106e20>] ? __audit_syscall_entry+0xb0/0x110
 [<ffffffff81002426>] ? do_audit_syscall_entry+0x66/0x70
 [<ffffffff81002457>] ? syscall_trace_enter_phase1+0x27/0x30
 [<ffffffff810028f3>] ? syscall_trace_enter+0x43/0x70
 [<ffffffff81002d4d>] do_syscall_64+0x6d/0x160
 [<ffffffff8100238c>] ? prepare_exit_to_usermode+0xbc/0xf0
 [<ffffffff8160ec7c>] entry_SYSCALL64_slow_path+0x25/0x25
---[ end trace 7c767b1804b8a6d5 ]---
kobject: 'end_device-2:0' (ffff8800dad0c010): kobject_uevent_env
kobject: 'end_device-2:0' (ffff8800dad0c010): fill_kobj_path: path =
'/host2/port-2:0/end_device-2:0/bsg/end_device-2:0'
kobject: 'bsg' (ffff8800d9fe4180): kobject_cleanup, parent ffff8800da435c10
kobject: 'bsg' (ffff8800d9fe4180): auto cleanup kobject_del
kobject: 'bsg' (ffff8800d9fe4180): calling ktype release
kobject: 'bsg': free name
kobject: 'end_device-2:0' (ffff8800dad0c010): kobject_cleanup, parent
         (null)
kobject: 'end_device-2:0' (ffff8800dad0c010): calling ktype release
kobject: 'end_device-2:0': free name
.....

~Sreekanth

On Wed, May 18, 2016 at 11:19 PM, Calvin Owens <calvinowens@fb.com> wrote:
> On Wednesday 05/18 at 18:44 +0530, Sreekanth Reddy wrote:
>> On Tue, May 17, 2016 at 8:43 AM, Calvin Owens <calvinowens@fb.com> wrote:
>> > On Monday 05/16 at 15:51 -0600, Sathya Prakash Veerichetty wrote:
>> >> Our understanding is the relationship between the SCSI host and SAS end
>> >> devices is a parent-child and before ripping of SCSI host we need to rip of
>> >> all the children.
>> >
>> > I know at least HPSA does the opposite, Elliott observes that other
>> > drivers do as well.
>> >
>> >> Why the enclosure ends up trying to re-parent the SCSI device from the
>> >> enclosure to the SAS PHY even after we remove the SAS Phy?
>> >
>> > Take this path in /sys:
>> >
>> > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host6/port-6:0/expander-6:0/port-6:0:0/end_device-6:0:0/target6:0:0/6:0:0:0/scsi_device/6:0:0:0
>> >
>> > The problem is "port-6:0:0" in the hierarchy: since mpt3sas explicitly
>> > calls sas_port_delete() before scsi_remove_host(), that node is removed
>> > from the device hierarchy and seems to orphan what lies beneath it:
>> >
>> > [  955.119328] kobject: '6:0:0:0' (ffff88074e2e1008): kobject_uevent_env
>> > [  955.133890] kobject: '6:0:0:0' (ffff88074e2e1008): fill_kobj_path: path = '/end_device-6:0:0/target6:0:0/6:0:0:0/bsg/6:0:0:0'
>> > <WARN/OOPS sadness follows>
>>
>> [Sreekanth] Calvin, Now with your patch we are observing same type of
>> <WARN/OOPS> while unregistered the drive with scsi_transport layer
>> after calling scsi_remove_host() on the same driver unload path,
>>
>> kobject: 'end_device-2:0' (ffff8800dad0c010): kobject_uevent_env
>> kobject: 'end_device-2:0' (ffff8800dad0c010): fill_kobj_path: path =
>> '/host2/port-2:0/end_device-2:0/bsg/end_device-2:0'
>>
>> As scsi_remove_host() has freed it's parent
>> "/devices/pci0000:00/0000:00:07.0" we are observing these warnings.
>>
>> here is it's complete path
>>
>>  fill_kobj_path: path =
>> '/devices/pci0000:00/0000:00:07.0/host2/port-2:0/end_device-2:0/bsg/end_device-2:0',
>
> Could you post the stacktraces for your WARN/OOPS?
>
>> Until on  ~3.19/4.0 kernel all the thing are working fine, we haven't
>> seen these types of kernel warning. Also we haven't done any changes
>> in mpt3sas driver w.r.t to this after 4.0 kernel.
>
> I'll dig around more. In any case, my patch seems to be papering over
> the problem rather than solving it, so we need a different fix.
>
> Thanks,
> Calvin
>
>> >
>> > As the above from CONFIG_DEBUG_KOBJECT suggests, "end_device-6:0:0" no longer
>> > has a parent. The path to the enclosured device is:
>> >
>> > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host6/port-6:0/expander-6:0/port-6:0:15/end_device-6:0:15/target6:0:15/6:0:15:0/enclosure/6:0:15:0/ArrayDevice09
>> >
>> > Enclosure wants to re-add the device outside the enclosure:
>> >
>> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/misc/enclosure.c#n404
>> >
>> > We melt when trying to re-add "ArrayDevice09" to the "6:0:15:0" directory
>> > above the "enclosure" directory best I can tell. This is *incredibly*
>> > confusing because kobj_add_internal() actually clobbers the parent pointer
>> > if it's NULL:
>> >
>> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/lib/kobject.c#n216
>> >
>> > So even though it appears that the parent was "6:0:15:0" here, it was
>> > actually NULL when passed into kobj_add_internal(). Thus, we try to do
>> > the add with the kset as our parent, and we get -ENOENT because that
>> > node has been marked as inactive in kernfs as per here:
>> >
>> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/kernfs/dir.c#n743
>> >
>> > Why is it inactive? I think, because __kernfs_remove() deleted an ancestor:
>> >
>> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/kernfs/dir.c#n1243
>> >
>> > ...which brings us full circle.
>> >
>> >> Doesn't this need to be taken care in enclosure_removal?
>> >
>> > Even with enclosure compiled out, I still get a WARN like the following
>> > for every block device being removed at rmmod (-dirty here is just from me
>> > applying a fix to SES I sent to the list a few days ago):
>> >
>> > [  224.057286] ------------[ cut here ]------------
>> > [  224.067696] WARNING: CPU: 1 PID: 5412 at fs/sysfs/group.c:237 sysfs_remove_group+0x10f/0x150
>> > [  224.088007] sysfs group ffffffff83109ba0 not found for kobject 'sdc'
>> > [  224.102283] Modules linked in: mpt3sas(-) <snip>
>> > [  224.200483] CPU: 0 PID: 5412 Comm: rmmod Not tainted 4.6.0-dirty #4
>> > [  224.219797] Hardware name: Wiwynn   HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016
>> > [  224.237001]  ffffffff82af6280 ffff8806e826f8b0 ffffffff81e13e54 ffff8806e826f928
>> > [  224.253811]  0000000000000000 ffff8806e826f8f8 ffffffff81127822 ffffffff831a5968
>> > [  224.270647]  ffff8806000000ed ffffed00dd04df21 0000000000000000 ffff880754ae3078
>> > [  224.287462] Call Trace:
>> > [  224.292953]  [<ffffffff81e13e54>] dump_stack+0x68/0x94
>> > [  224.304433]  [<ffffffff81127822>] __warn+0x172/0x1b0
>> > [  224.315514]  [<ffffffff811278f7>] warn_slowpath_fmt+0x97/0xb0
>> > [  224.351845]  [<ffffffff816bbc1f>] sysfs_remove_group+0x10f/0x150
>> > [  224.365252]  [<ffffffff81377814>] blk_trace_remove_sysfs+0x14/0x20
>> > [  224.379021]  [<ffffffff81d966bf>] blk_unregister_queue+0xcf/0x120
>> > [  224.392589]  [<ffffffff81dc8c7e>] del_gendisk+0x26e/0x600
>> > [  224.432345]  [<ffffffff8215ea75>] sd_remove+0xf5/0x1b0
>> > [  224.443808]  [<ffffffff820dd210>] __device_release_driver+0x160/0x3a0
>> > [  224.458151]  [<ffffffff820dd475>] device_release_driver+0x25/0x40
>> > [  224.471734]  [<ffffffff820dbd81>] bus_remove_device+0x2e1/0x4b0
>> > [  224.484926]  [<ffffffff820d25ad>] device_del+0x37d/0x680
>> > [  224.555423]  [<ffffffff82148225>] __scsi_remove_device+0x1e5/0x250
>> > [  224.569196]  [<ffffffff821443bc>] scsi_forget_host+0x12c/0x1e0
>> > [  224.582197]  [<ffffffff82120b6c>] scsi_remove_host+0x10c/0x300
>> > [  224.595227]  [<ffffffffa0066cc9>] scsih_remove+0x339/0x670 [mpt3sas]
>> > [  224.609383]  [<ffffffff81edd400>] pci_device_remove+0x70/0x110
>> > [  224.622379]  [<ffffffff820dd210>] __device_release_driver+0x160/0x3a0
>> > [  224.636724]  [<ffffffff820de7b3>] driver_detach+0x183/0x200
>> > [  224.649166]  [<ffffffff820dc61f>] bus_remove_driver+0xdf/0x200
>> > [  224.662185]  [<ffffffff820df2e7>] driver_unregister+0x67/0xa0
>> > [  224.675009]  [<ffffffff81edb2ee>] pci_unregister_driver+0x1e/0xe0
>> > [  224.688613]  [<ffffffffa0093a3a>] _mpt3sas_exit+0x23/0x5e9 [mpt3sas]
>> > [  224.702782]  [<ffffffff812a2d1b>] SyS_delete_module+0x2eb/0x390
>> > [  224.743359]  [<ffffffff829529e5>] entry_SYSCALL_64_fastpath+0x18/0xa8
>> > [  224.759036] ---[ end trace 13f1919bf2ec7bb0 ]---
>> >
>> > My patch fixes that too :)
>> >
>> > The only driver besides mpt*sas that calls sas_delete_port() explicitly is
>> > HPSA, and it does it in the opposite order mpt3sas does: scsi_remove_host()
>> > first.
>> >
>> > Thanks,
>> > Calvin
>> >
>> >> -----Original Message-----
>> >> From: Calvin Owens [mailto:calvinowens@fb.com]
>> >> Sent: Monday, May 16, 2016 2:25 PM
>> >> To: Elliott, Robert (Persistent Memory)
>> >> Cc: Sathya Prakash; Chaitra P B; Suganath Prabu Subramani; James E.J.
>> >> Bottomley; Martin K. Petersen; MPT-FusionLinux.pdl@broadcom.com;
>> >> linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org;
>> >> kernel-team@fb.com; calvinowens@fb.com
>> >> Subject: Re: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY
>> >> objects
>> >>
>> >> On Friday 05/13 at 21:17 +0000, Elliott, Robert (Persistent Memory) wrote:
>> >> >
>> >> >
>> >> > > -----Original Message-----
>> >> > > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
>> >> > > owner@vger.kernel.org] On Behalf Of Calvin Owens
>> >> > > Sent: Friday, May 13, 2016 3:28 PM
>> >> > ...
>> >> > > Subject: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS
>> >> > > PHY objects
>> >> > >
>> >> > ...
>> >> > > The issue is that enclosure_remove_device() expects to be able to
>> >> > > re-add the device that was previously enclosured: so with SES
>> >> > > running, the order we unwind things matters in a way it otherwise
>> >> > > wouldn't.
>> >> > >
>> >> > > Since mpt3sas deletes the SAS objects before the SCSI hosts,
>> >> > > enclosure ends up trying to re-parent the SCSI device from the
>> >> > > enclosure to the SAS PHY which has already been deleted. This obviously
>> >> > > ends in sadness.
>> >> > >
>> >> > > The fix appears to be simple: just call scsi_remove_host() before we
>> >> > > call
>> >> > > sas_port_delete() and/or sas_remove_host().
>> >> > >
>> >> > ...
>> >> > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
>> >> > > @@ -8149,6 +8149,8 @@ void scsih_remove(struct pci_dev *pdev)
>> >> > >           _scsih_raid_device_remove(ioc, raid_device);
>> >> > >   }
>> >> > >
>> >> > > + scsi_remove_host(shost);
>> >> > > +
>> >> > >   /* free ports attached to the sas_host */
>> >> > >   list_for_each_entry_safe(mpt3sas_port, next_port,
>> >> > >      &ioc->sas_hba.sas_port_list, port_list) { @@ -8172,7 +8174,6 @@
>> >> > > void scsih_remove(struct pci_dev *pdev)
>> >> > >   }
>> >> > >
>> >> > >   sas_remove_host(shost);
>> >> > > - scsi_remove_host(shost);
>> >> > >   mpt3sas_base_detach(ioc);
>> >> > >   spin_lock(&gioc_lock);
>> >> > >   list_del(&ioc->list);
>> >> >
>> >> > That change matches the pattern of all other drivers calling
>> >> > sas_remove_host, except for one:
>> >> >     drivers/message/fusion/mptsas.c
>> >> >
>> >> > That consensus means this comment in drivers/scsi/scsi_transport_sas.c
>> >> > is wrong:
>> >> >
>> >> > /**
>> >> >  * sas_remove_host  -  tear down a Scsi_Host's SAS data structures
>> >> >  * @shost:      Scsi Host that is torn down
>> >> >  *
>> >> >  * Removes all SAS PHYs and remote PHYs for a given Scsi_Host.
>> >> >  * Must be called just before scsi_remove_host for SAS HBAs.
>> >> >  */
>> >>
>> >> Yes, exactly: that comment appears to be backwards, and as you say most
>> >> drivers appear to do the opposite (I looked at HPSA at least, which calls
>> >> sas_port_delete() before scsi_remove_host()).
>> >>
>> >> I suppose I should send a patch to fix the comment as well? It'd be nice to
>> >> get some testing to be certain I'm not breaking somebody else's setup by
>> >> fixing mine though...
>> >>
>> >> Thanks,
>> >> Calvin

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-05-19  4:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-13 20:28 [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects Calvin Owens
2016-05-13 21:17 ` Elliott, Robert (Persistent Memory)
2016-05-16 20:25   ` Calvin Owens
2016-05-16 21:51     ` Sathya Prakash Veerichetty
2016-05-17  3:13       ` Calvin Owens
2016-05-18 13:14         ` Sreekanth Reddy
2016-05-18 17:49           ` Calvin Owens
2016-05-19  4:06             ` Sreekanth Reddy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).