All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme-fc: fix module unloads while lports still pending
@ 2019-06-25 16:06 James Smart
  2019-06-26 22:20 ` Arun Easi
  2019-06-28  7:16 ` Christoph Hellwig
  0 siblings, 2 replies; 10+ messages in thread
From: James Smart @ 2019-06-25 16:06 UTC (permalink / raw)


Current code allows the module to be unloaded even if there are
pending data structures, such as localports and controllers on
the localports, that have yet to hit their reference counting
to remove them.

Fix by taking a reference out on the module upon the first localport
registered, and which is cleared when the last localport is removed.

Signed-off-by: James Smart <jsmart2021 at gmail.com>
---
 drivers/nvme/host/fc.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index b7662e237eb9..bd80e2c475da 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -229,6 +229,8 @@ nvme_fc_free_lport(struct kref *ref)
 	/* remove from transport list */
 	spin_lock_irqsave(&nvme_fc_lock, flags);
 	list_del(&lport->port_list);
+	if (list_empty(&nvme_fc_lport_list))
+		module_put(THIS_MODULE);
 	spin_unlock_irqrestore(&nvme_fc_lock, flags);
 
 	ida_simple_remove(&nvme_fc_local_port_cnt, lport->localport.port_num);
@@ -398,6 +400,11 @@ nvme_fc_register_localport(struct nvme_fc_port_info *pinfo,
 	newrec->localport.port_num = idx;
 
 	spin_lock_irqsave(&nvme_fc_lock, flags);
+	if (list_empty(&nvme_fc_lport_list) && !try_module_get(THIS_MODULE)) {
+		spin_unlock_irqrestore(&nvme_fc_lock, flags);
+		ret = -ENODEV;
+		goto out_dev_put;
+	}
 	list_add_tail(&newrec->port_list, &nvme_fc_lport_list);
 	spin_unlock_irqrestore(&nvme_fc_lock, flags);
 
@@ -407,6 +414,9 @@ nvme_fc_register_localport(struct nvme_fc_port_info *pinfo,
 	*portptr = &newrec->localport;
 	return 0;
 
+out_dev_put:
+	if (dev)
+		put_device(dev);
 out_ida_put:
 	ida_simple_remove(&nvme_fc_local_port_cnt, idx);
 out_fail_kfree:
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] nvme-fc: fix module unloads while lports still pending
  2019-06-25 16:06 [PATCH] nvme-fc: fix module unloads while lports still pending James Smart
@ 2019-06-26 22:20 ` Arun Easi
  2019-06-26 23:00   ` James Smart
  2019-06-28  7:16 ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Arun Easi @ 2019-06-26 22:20 UTC (permalink / raw)


On Tue, 25 Jun 2019, 9:06am, James Smart wrote:

> Current code allows the module to be unloaded even if there are
> pending data structures, such as localports and controllers on
> the localports, that have yet to hit their reference counting
> to remove them.
> 
> Fix by taking a reference out on the module upon the first localport
> registered, and which is cleared when the last localport is removed.
> 
> Signed-off-by: James Smart <jsmart2021 at gmail.com>
> ---
>  drivers/nvme/host/fc.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index b7662e237eb9..bd80e2c475da 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -229,6 +229,8 @@ nvme_fc_free_lport(struct kref *ref)
>  	/* remove from transport list */
>  	spin_lock_irqsave(&nvme_fc_lock, flags);
>  	list_del(&lport->port_list);
> +	if (list_empty(&nvme_fc_lport_list))
> +		module_put(THIS_MODULE);
>  	spin_unlock_irqrestore(&nvme_fc_lock, flags);
>  
>  	ida_simple_remove(&nvme_fc_local_port_cnt, lport->localport.port_num);
> @@ -398,6 +400,11 @@ nvme_fc_register_localport(struct nvme_fc_port_info *pinfo,
>  	newrec->localport.port_num = idx;
>  
>  	spin_lock_irqsave(&nvme_fc_lock, flags);
> +	if (list_empty(&nvme_fc_lport_list) && !try_module_get(THIS_MODULE)) {
> +		spin_unlock_irqrestore(&nvme_fc_lock, flags);
> +		ret = -ENODEV;
> +		goto out_dev_put;
> +	}
>  	list_add_tail(&newrec->port_list, &nvme_fc_lport_list);
>  	spin_unlock_irqrestore(&nvme_fc_lock, flags);
>  
> @@ -407,6 +414,9 @@ nvme_fc_register_localport(struct nvme_fc_port_info *pinfo,
>  	*portptr = &newrec->localport;
>  	return 0;
>  
> +out_dev_put:
> +	if (dev)
> +		put_device(dev);
>  out_ida_put:
>  	ida_simple_remove(&nvme_fc_local_port_cnt, idx);
>  out_fail_kfree:
> 

Changes look ok.

Did you hit this during testing? Wondering because LLD would have unloaded 
prior to nvme_fc, so no lport/rport should remain if LLD behaves well.

-Arun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] nvme-fc: fix module unloads while lports still pending
  2019-06-26 22:20 ` Arun Easi
@ 2019-06-26 23:00   ` James Smart
  2019-06-27  0:02     ` [EXT] " Arun Easi
  0 siblings, 1 reply; 10+ messages in thread
From: James Smart @ 2019-06-26 23:00 UTC (permalink / raw)


On 6/26/2019 3:20 PM, Arun Easi wrote:
> Changes look ok.
> 
> Did you hit this during testing? Wondering because LLD would have unloaded
> prior to nvme_fc, so no lport/rport should remain if LLD behaves well.
> 
> -Arun
> 

It's been a longstanding annoyance - usually tied with test cases that 
used modprobe -r.   Issue is the drivers would call the targetport 
unregister calls, which would release the targetport once all the 
outstanding ios were cancelled to the lldd and the link-side association 
terminated, but the controllers are in a reconnecting and not yet torn 
down. From the drivers point of view, they are unlinked. Thus it was 
intended the driver could unregister/unload/reload/reregister and if 
possible reconnect - allowing the nvme device to never be deleted. Given 
the lack of module reference, when the driver module left, so did the 
transport module, even with the live data structures.

-- james

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [EXT] Re: [PATCH] nvme-fc: fix module unloads while lports still pending
  2019-06-26 23:00   ` James Smart
@ 2019-06-27  0:02     ` Arun Easi
  2019-06-27  0:28       ` Arun Easi
  2019-06-27  0:34       ` James Smart
  0 siblings, 2 replies; 10+ messages in thread
From: Arun Easi @ 2019-06-27  0:02 UTC (permalink / raw)


On Wed, 26 Jun 2019, 4:00pm, James Smart wrote:

> External Email
> 
> ----------------------------------------------------------------------
> On 6/26/2019 3:20 PM, Arun Easi wrote:
> > Changes look ok.
> > 
> > Did you hit this during testing? Wondering because LLD would have unloaded
> > prior to nvme_fc, so no lport/rport should remain if LLD behaves well.
> > 
> > -Arun
> > 
> 
> It's been a longstanding annoyance - usually tied with test cases that used
> modprobe -r.   Issue is the drivers would call the targetport unregister
> calls, which would release the targetport once all the outstanding ios were
> cancelled to the lldd and the link-side association terminated, but the
> controllers are in a reconnecting and not yet torn down.

Did you mean a "nvme_reset_ctrl/nvme_delete_ctrl" is in progress?

> From the drivers point of view, they are unlinked. Thus it was 
> intended the driver could unregister/unload/reload/reregister and if 
> possible reconnect - allowing the nvme device to never be deleted. Given 
> the lack of module reference, when the driver module left, so did the 
> transport module, even with the live data structures.
> 

Thanks for the explanation. This reminds me of a similar issue we hit. The 
fix under consideration was to wait in nvme_fc module exit to quiesce 
rport/lport before proceeding.

Anyway, with the fix above, I am wondering, if nvme_fc is active for a 
brief time just after the unload of LLD, the unload of nvme_fc would fail 
(saying in use), wouldn't it? If so, in addition to the above fix, a 
quiesce (flush) of such threads during module exit would give "modprobe 
-r" a better chance to unload both modules.

Thoughts?

-Arun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [EXT] Re: [PATCH] nvme-fc: fix module unloads while lports still pending
  2019-06-27  0:02     ` [EXT] " Arun Easi
@ 2019-06-27  0:28       ` Arun Easi
  2019-06-27  0:49         ` James Smart
  2019-06-27  0:34       ` James Smart
  1 sibling, 1 reply; 10+ messages in thread
From: Arun Easi @ 2019-06-27  0:28 UTC (permalink / raw)


Attaching the patch I was referring earlier. Maybe the infinite loop in 
exit could be changed to a time bound one.

-Arun

On Wed, 26 Jun 2019, 5:02pm, Arun Easi wrote:

> On Wed, 26 Jun 2019, 4:00pm, James Smart wrote:
> 
> > External Email
> > 
> > ----------------------------------------------------------------------
> > On 6/26/2019 3:20 PM, Arun Easi wrote:
> > > Changes look ok.
> > > 
> > > Did you hit this during testing? Wondering because LLD would have unloaded
> > > prior to nvme_fc, so no lport/rport should remain if LLD behaves well.
> > > 
> > > -Arun
> > > 
> > 
> > It's been a longstanding annoyance - usually tied with test cases that used
> > modprobe -r.   Issue is the drivers would call the targetport unregister
> > calls, which would release the targetport once all the outstanding ios were
> > cancelled to the lldd and the link-side association terminated, but the
> > controllers are in a reconnecting and not yet torn down.
> 
> Did you mean a "nvme_reset_ctrl/nvme_delete_ctrl" is in progress?
> 
> > From the drivers point of view, they are unlinked. Thus it was 
> > intended the driver could unregister/unload/reload/reregister and if 
> > possible reconnect - allowing the nvme device to never be deleted. Given 
> > the lack of module reference, when the driver module left, so did the 
> > transport module, even with the live data structures.
> > 
> 
> Thanks for the explanation. This reminds me of a similar issue we hit. The 
> fix under consideration was to wait in nvme_fc module exit to quiesce 
> rport/lport before proceeding.
> 
> Anyway, with the fix above, I am wondering, if nvme_fc is active for a 
> brief time just after the unload of LLD, the unload of nvme_fc would fail 
> (saying in use), wouldn't it? If so, in addition to the above fix, a 
> quiesce (flush) of such threads during module exit would give "modprobe 
> -r" a better chance to unload both modules.
> 
> Thoughts?
> 
> -Arun
> 
> 
> 
-------------- next part --------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [EXT] Re: [PATCH] nvme-fc: fix module unloads while lports still pending
  2019-06-27  0:02     ` [EXT] " Arun Easi
  2019-06-27  0:28       ` Arun Easi
@ 2019-06-27  0:34       ` James Smart
  2019-06-27  1:01         ` Arun Easi
  1 sibling, 1 reply; 10+ messages in thread
From: James Smart @ 2019-06-27  0:34 UTC (permalink / raw)


On 6/26/2019 5:02 PM, Arun Easi wrote:
> On Wed, 26 Jun 2019, 4:00pm, James Smart wrote:
> 
>> External Email
>>
>> ----------------------------------------------------------------------
>> On 6/26/2019 3:20 PM, Arun Easi wrote:
>>> Changes look ok.
>>>
>>> Did you hit this during testing? Wondering because LLD would have unloaded
>>> prior to nvme_fc, so no lport/rport should remain if LLD behaves well.
>>>
>>> -Arun
>>>
>>
>> It's been a longstanding annoyance - usually tied with test cases that used
>> modprobe -r.   Issue is the drivers would call the targetport unregister
>> calls, which would release the targetport once all the outstanding ios were
>> cancelled to the lldd and the link-side association terminated, but the
>> controllers are in a reconnecting and not yet torn down.
> 
> Did you mean a "nvme_reset_ctrl/nvme_delete_ctrl" is in progress?

no.  The delete won't happen till the reconnect timer expires.

> 
>>  From the drivers point of view, they are unlinked. Thus it was
>> intended the driver could unregister/unload/reload/reregister and if
>> possible reconnect - allowing the nvme device to never be deleted. Given
>> the lack of module reference, when the driver module left, so did the
>> transport module, even with the live data structures.
>>
> 
> Thanks for the explanation. This reminds me of a similar issue we hit. The
> fix under consideration was to wait in nvme_fc module exit to quiesce
> rport/lport before proceeding.
> 
> Anyway, with the fix above, I am wondering, if nvme_fc is active for a
> brief time just after the unload of LLD, the unload of nvme_fc would fail
> (saying in use), wouldn't it? If so, in addition to the above fix, a
> quiesce (flush) of such threads during module exit would give "modprobe
> -r" a better chance to unload both modules.
> 
> Thoughts?
> 
> -Arun

True, the driver would unload but not the transport, but that's intended 
as we're trying to allow the driver to reload and not delete the nvme 
device in the system. If the device deletes, as there may be real 
consumers of it, they will die if not multipathed. A prime example is 
installing a new driver with a bug fix.  I'm far more interested in that 
behavior than making modprobe -r remove the most modules possible.

-- james

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [EXT] Re: [PATCH] nvme-fc: fix module unloads while lports still pending
  2019-06-27  0:28       ` Arun Easi
@ 2019-06-27  0:49         ` James Smart
  2019-06-27  1:09           ` Arun Easi
  0 siblings, 1 reply; 10+ messages in thread
From: James Smart @ 2019-06-27  0:49 UTC (permalink / raw)


On 6/26/2019 5:28 PM, Arun Easi wrote:
> Attaching the patch I was referring earlier. Maybe the infinite loop in
> exit could be changed to a time bound one.
> 
> -Arun
> 

Interesting. I still prefer the stronger referencing while anything is 
outstanding as I think it's really ugly to get into __exit at all when 
things haven't cleaned up, and once in __exit, there's no going back.
Also concerned with:
- hopefully module load checking prevents loading of another lldd
   while in this state. Ugly for that thing if its rejected and the
   load has to be retried.
- I don't like the long delays (could be a minute plus) for unload 
command while it waits - anyone stuck in the wait will be asking 
questions as to why
- Exiting after a time delay isn't great as it's just as bad as the 
existing code that left with things not cleaned up. I was hoping to 
correct that.

-- james

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [EXT] Re: [PATCH] nvme-fc: fix module unloads while lports still pending
  2019-06-27  0:34       ` James Smart
@ 2019-06-27  1:01         ` Arun Easi
  0 siblings, 0 replies; 10+ messages in thread
From: Arun Easi @ 2019-06-27  1:01 UTC (permalink / raw)


On Wed, 26 Jun 2019, 5:34pm, James Smart wrote:

> On 6/26/2019 5:02 PM, Arun Easi wrote:
> > On Wed, 26 Jun 2019, 4:00pm, James Smart wrote:
> > 
> > > External Email
> > > 
> > > ----------------------------------------------------------------------
> > > On 6/26/2019 3:20 PM, Arun Easi wrote:
> > > > Changes look ok.
> > > > 
> > > > Did you hit this during testing? Wondering because LLD would have
> > > > unloaded
> > > > prior to nvme_fc, so no lport/rport should remain if LLD behaves well.
> > > > 
> > > > -Arun
> > > > 
> > > 
> > > It's been a longstanding annoyance - usually tied with test cases that
> > > used
> > > modprobe -r.   Issue is the drivers would call the targetport unregister
> > > calls, which would release the targetport once all the outstanding ios
> > > were
> > > cancelled to the lldd and the link-side association terminated, but the
> > > controllers are in a reconnecting and not yet torn down.
> > 
> > Did you mean a "nvme_reset_ctrl/nvme_delete_ctrl" is in progress?
> 
> no.  The delete won't happen till the reconnect timer expires.
> 
> > 
> > >  From the drivers point of view, they are unlinked. Thus it was
> > > intended the driver could unregister/unload/reload/reregister and if
> > > possible reconnect - allowing the nvme device to never be deleted. Given
> > > the lack of module reference, when the driver module left, so did the
> > > transport module, even with the live data structures.
> > > 
> > 
> > Thanks for the explanation. This reminds me of a similar issue we hit. The
> > fix under consideration was to wait in nvme_fc module exit to quiesce
> > rport/lport before proceeding.
> > 
> > Anyway, with the fix above, I am wondering, if nvme_fc is active for a
> > brief time just after the unload of LLD, the unload of nvme_fc would fail
> > (saying in use), wouldn't it? If so, in addition to the above fix, a
> > quiesce (flush) of such threads during module exit would give "modprobe
> > -r" a better chance to unload both modules.
> > 
> > Thoughts?
> > 
> > -Arun
> 
> True, the driver would unload but not the transport, but that's intended as
> we're trying to allow the driver to reload and not delete the nvme device in
> the system. If the device deletes, as there may be real consumers of it, they
> will die if not multipathed. A prime example is installing a new driver with a
> bug fix.  I'm far more interested in that behavior than making modprobe -r
> remove the most modules possible.
> 

That makes sense. With that design approach, I agree, waiting in module 
exit is not that required and a failed modprobe of nvme_fc for a short 
duration is not that big of a deal.

+1 for your patch.

-Arun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [EXT] Re: [PATCH] nvme-fc: fix module unloads while lports still pending
  2019-06-27  0:49         ` James Smart
@ 2019-06-27  1:09           ` Arun Easi
  0 siblings, 0 replies; 10+ messages in thread
From: Arun Easi @ 2019-06-27  1:09 UTC (permalink / raw)


On Wed, 26 Jun 2019, 5:49pm, James Smart wrote:

> On 6/26/2019 5:28 PM, Arun Easi wrote:
> > Attaching the patch I was referring earlier. Maybe the infinite loop in
> > exit could be changed to a time bound one.
> > 
> > -Arun
> > 
> 
> Interesting. I still prefer the stronger referencing while anything is

Yes, no denial of that. My suggestion for this patch was an addendum to 
your patch.

> outstanding as I think it's really ugly to get into __exit at all when things
> haven't cleaned up, and once in __exit, there's no going back.
> Also concerned with:
> - hopefully module load checking prevents loading of another lldd
>   while in this state. Ugly for that thing if its rejected and the
>   load has to be retried.
> - I don't like the long delays (could be a minute plus) for unload command
> while it waits - anyone stuck in the wait will be asking questions as to why
> - Exiting after a time delay isn't great as it's just as bad as the existing
> code that left with things not cleaned up. I was hoping to correct that.

Yes, again this was not in lieu of your patch. With your patch and this 
patch, with maybe a wait for a few seconds (rather than the infinite loop 
in the patch) in the exit path, for a chance for nvme_fc unload to go 
through, I was attempting to let modprobe go through well for all modules. 
But with the design idea to attempt to keep the nvme device alive, this is 
not required.

-Arun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] nvme-fc: fix module unloads while lports still pending
  2019-06-25 16:06 [PATCH] nvme-fc: fix module unloads while lports still pending James Smart
  2019-06-26 22:20 ` Arun Easi
@ 2019-06-28  7:16 ` Christoph Hellwig
  1 sibling, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2019-06-28  7:16 UTC (permalink / raw)


On Tue, Jun 25, 2019@09:06:04AM -0700, James Smart wrote:
> Current code allows the module to be unloaded even if there are
> pending data structures, such as localports and controllers on
> the localports, that have yet to hit their reference counting
> to remove them.
> 
> Fix by taking a reference out on the module upon the first localport
> registered, and which is cleared when the last localport is removed.

This looks broken.  Module unload needs to take everything down
properly, and if needed wait on reference counts to be dropped.  We
don't do that by incrementing the refcount in the module, as that
actually prevents removals.

> +	if (list_empty(&nvme_fc_lport_list) && !try_module_get(THIS_MODULE)) {
> +		spin_unlock_irqrestore(&nvme_fc_lock, flags);
> +		ret = -ENODEV;
> +		goto out_dev_put;
> +	}

And try_module_get(THIS_MODULE) is a classic anti-pattern.  If you know
you already have a reference you must use __module_get().  If you don't
know if there is another held reference using try_module_get on
THIS_MODULE also is unsafe - there is a reason why we always call it
from another module, usually on a struct module in an ops vector.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-06-28  7:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-25 16:06 [PATCH] nvme-fc: fix module unloads while lports still pending James Smart
2019-06-26 22:20 ` Arun Easi
2019-06-26 23:00   ` James Smart
2019-06-27  0:02     ` [EXT] " Arun Easi
2019-06-27  0:28       ` Arun Easi
2019-06-27  0:49         ` James Smart
2019-06-27  1:09           ` Arun Easi
2019-06-27  0:34       ` James Smart
2019-06-27  1:01         ` Arun Easi
2019-06-28  7:16 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.