All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2] nvme: free pre-allocated queue if create ioq goes wrong
@ 2018-01-14 21:00 Minwoo Im
  2018-01-15  2:00 ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Minwoo Im @ 2018-01-14 21:00 UTC (permalink / raw)


If either create-sq or create-cq is failed with an NVMe status,
nvme_init_queue() is invoked which was not expected to be called in
nvme_create_queue().
To prevent this inconsistency of queue_count, free queue(s) already
allocated in nvme_create_io_queues() when it goes wrong in
nvme_create_queue().

Changes to V1:
    - Move calling nvme_free_queues() to nvme_create_io_queues() instead of
      nvme_create_queue(). (Sagi)

Signed-off-by: Minwoo Im <minwoo.im.dev at gmail.com>
Reviewed-by: Sagi Grimberg <sagi at grmberg.me>
---
 drivers/nvme/host/pci.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d53550e..ecef45a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1456,11 +1456,11 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
 
 	nvmeq->cq_vector = qid - 1;
 	result = adapter_alloc_cq(dev, qid, nvmeq);
-	if (result < 0)
+	if (result)
 		return result;
 
 	result = adapter_alloc_sq(dev, qid, nvmeq);
-	if (result < 0)
+	if (result)
 		goto release_cq;
 
 	nvme_init_queue(nvmeq, qid);
@@ -1476,7 +1476,6 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
 	adapter_delete_cq(dev, qid);
 	return result;
 }
-
 static const struct blk_mq_ops nvme_mq_admin_ops = {
 	.queue_rq	= nvme_queue_rq,
 	.complete	= nvme_pci_complete_rq,
@@ -1637,8 +1636,10 @@ static int nvme_create_io_queues(struct nvme_dev *dev)
 	max = min(dev->max_qid, dev->ctrl.queue_count - 1);
 	for (i = dev->online_queues; i <= max; i++) {
 		ret = nvme_create_queue(dev->queues[i], i);
-		if (ret)
+		if (ret) {
+			nvme_free_queues(dev, dev->online_queues);
 			break;
+		}
 	}
 
 	/*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V2] nvme: free pre-allocated queue if create ioq goes wrong
  2018-01-14 21:00 [PATCH V2] nvme: free pre-allocated queue if create ioq goes wrong Minwoo Im
@ 2018-01-15  2:00 ` Keith Busch
  2018-01-17 13:07   ` Minwoo Im
  2018-01-17 15:00   ` Minwoo Im
  0 siblings, 2 replies; 8+ messages in thread
From: Keith Busch @ 2018-01-15  2:00 UTC (permalink / raw)


On Mon, Jan 15, 2018@06:00:50AM +0900, Minwoo Im wrote:
>  static const struct blk_mq_ops nvme_mq_admin_ops = {
>  	.queue_rq	= nvme_queue_rq,
>  	.complete	= nvme_pci_complete_rq,
> @@ -1637,8 +1636,10 @@ static int nvme_create_io_queues(struct nvme_dev *dev)
>  	max = min(dev->max_qid, dev->ctrl.queue_count - 1);
>  	for (i = dev->online_queues; i <= max; i++) {
>  		ret = nvme_create_queue(dev->queues[i], i);
> -		if (ret)
> +		if (ret) {
> +			nvme_free_queues(dev, dev->online_queues);
>  			break;
> +		}
>  	}

Unless this is the very first pass at initialisation, I don't think we
can free queues until after blk_mq_update_nr_hw_queues since the hctx
could otherwise point to freed memory.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V2] nvme: free pre-allocated queue if create ioq goes wrong
  2018-01-15  2:00 ` Keith Busch
@ 2018-01-17 13:07   ` Minwoo Im
  2018-01-17 15:00   ` Minwoo Im
  1 sibling, 0 replies; 8+ messages in thread
From: Minwoo Im @ 2018-01-17 13:07 UTC (permalink / raw)


On Mon, Jan 15, 2018@11:00 AM, Keith Busch <keith.busch@intel.com> wrote:
> On Mon, Jan 15, 2018@06:00:50AM +0900, Minwoo Im wrote:
>>  static const struct blk_mq_ops nvme_mq_admin_ops = {
>>       .queue_rq       = nvme_queue_rq,
>>       .complete       = nvme_pci_complete_rq,
>> @@ -1637,8 +1636,10 @@ static int nvme_create_io_queues(struct nvme_dev *dev)
>>       max = min(dev->max_qid, dev->ctrl.queue_count - 1);
>>       for (i = dev->online_queues; i <= max; i++) {
>>               ret = nvme_create_queue(dev->queues[i], i);
>> -             if (ret)
>> +             if (ret) {
>> +                     nvme_free_queues(dev, dev->online_queues);
>>                       break;
>> +             }
>>       }
>
> Unless this is the very first pass at initialisation, I don't think we
> can free queues until after blk_mq_update_nr_hw_queues since the hctx
> could otherwise point to freed memory.

Keith,

I agree that if not all create io and sq commands are failed,
blk_mq_update_nr_hw_queues()
will update nr_hw_queues and hw ctxs to be freed.

If it's not in the very first initialization step, however, and *all*
create io sq and cq commands
are failed in nvme_reset_work() because of controller reset (e.g. #
nvme reset /dev/nvme0),
blk_mq_update_nr_hw_queues() seems it cannot update nr_hw_queues
because this function
will not update it when it's zero.

Would it be better if request blk-mq to free all hw ctxs instead of trying
blk_mq_update_nr_hw_queues() ?

Thanks,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V2] nvme: free pre-allocated queue if create ioq goes wrong
  2018-01-15  2:00 ` Keith Busch
  2018-01-17 13:07   ` Minwoo Im
@ 2018-01-17 15:00   ` Minwoo Im
  2018-01-18  5:27     ` jianchao.wang
  1 sibling, 1 reply; 8+ messages in thread
From: Minwoo Im @ 2018-01-17 15:00 UTC (permalink / raw)


On Mon, Jan 15, 2018@11:00 AM, Keith Busch <keith.busch@intel.com> wrote:
> Unless this is the very first pass at initialisation, I don't think we
> can free queues until after blk_mq_update_nr_hw_queues since the hctx
> could otherwise point to freed memory.

If not in the first initial step, if (_online_queues_ < 2) driver will
kill queues and
remove namespaces. I thought this "killing queue" will handle what you concerned
about. If I misunderstand what it is, please let me know.

Otherwise if (_online_queues_ >= 2) which means that
at least one or more IO queue is prepared, blk_mq_update_nr_hw_queues()
will be triggered as you said.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V2] nvme: free pre-allocated queue if create ioq goes wrong
  2018-01-17 15:00   ` Minwoo Im
@ 2018-01-18  5:27     ` jianchao.wang
  2018-01-18 10:25       ` Minwoo Im
  0 siblings, 1 reply; 8+ messages in thread
From: jianchao.wang @ 2018-01-18  5:27 UTC (permalink / raw)


Hi Minwoo 

On 01/17/2018 11:00 PM, Minwoo Im wrote:
> On Mon, Jan 15, 2018@11:00 AM, Keith Busch <keith.busch@intel.com> wrote:
>> Unless this is the very first pass at initialisation, I don't think we
>> can free queues until after blk_mq_update_nr_hw_queues since the hctx
>> could otherwise point to freed memory.
> 
> If not in the first initial step, if (_online_queues_ < 2) driver will
> kill queues and
> remove namespaces. I thought this "killing queue" will handle what you concerned
> about. If I misunderstand what it is, please let me know.
> Think of the following scenario:
nvme_reset_work
  -> nvme_setup_io_queues
    -> nvme_create_io_queues
      -> nvme_free_queues
  -> nvme_kill_queues
    -> blk_set_queue_dying   // just freeze the queue here, but will not wait to be drained.
                                not new requests come in, but maybe still residual requests in blk-mq queues.
    -> blk_mq_unquiesce_queue

the queues are _unquiesced_ here, then the residual requests will be queued
and go through nvme_queue_rq. Then the freed nvme_queue structure will be accessed.
:)

Thanks
Jianchao

> Otherwise if (_online_queues_ >= 2) which means that
> at least one or more IO queue is prepared, blk_mq_update_nr_hw_queues()
> will be triggered as you said.
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=DkKtzCStCZ1WNuxsJ2wSR-xMZ6lJWOHwGIdXYLbzPYc&s=BO4fWOEqPAS4YnfcEoj8jFyeEH68XPsseHc6Fc4PpsQ&e=
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V2] nvme: free pre-allocated queue if create ioq goes wrong
  2018-01-18  5:27     ` jianchao.wang
@ 2018-01-18 10:25       ` Minwoo Im
  2018-01-18 11:31         ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Minwoo Im @ 2018-01-18 10:25 UTC (permalink / raw)


On Thu, Jan 18, 2018 at 2:27 PM, jianchao.wang
<jianchao.w.wang@oracle.com> wrote:
> Hi Minwoo

>> Think of the following scenario:
> nvme_reset_work
>   -> nvme_setup_io_queues
>     -> nvme_create_io_queues
>       -> nvme_free_queues
>   -> nvme_kill_queues
>     -> blk_set_queue_dying   // just freeze the queue here, but will not wait to be drained.
>                                 not new requests come in, but maybe still residual requests in blk-mq queues.
>     -> blk_mq_unquiesce_queue
>
> the queues are _unquiesced_ here, then the residual requests will be queued
> and go through nvme_queue_rq. Then the freed nvme_queue structure will be accessed.
> :)
>
> Thanks
> Jianchao

Hi Jianchao,

First of all, I really appreciate letting me know the case.
It seems no one updates the actual nr_hw_queues value and frees hctxs
after nvme_kill_queues().
If you don't mind, would you please tell me where hctxs are freed
after nvme_kill_queues()?
I'm still trying to understand what's going on in blk-mq layer, but
I'm hardly getting to it.

Thanks,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V2] nvme: free pre-allocated queue if create ioq goes wrong
  2018-01-18 10:25       ` Minwoo Im
@ 2018-01-18 11:31         ` Keith Busch
  2018-01-18 22:52           ` Minwoo Im
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2018-01-18 11:31 UTC (permalink / raw)


On Thu, Jan 18, 2018@07:25:06PM +0900, Minwoo Im wrote:
> On Thu, Jan 18, 2018 at 2:27 PM, jianchao.wang
> <jianchao.w.wang@oracle.com> wrote:
> > Hi Minwoo
> 
> >> Think of the following scenario:
> > nvme_reset_work
> >   -> nvme_setup_io_queues
> >     -> nvme_create_io_queues
> >       -> nvme_free_queues
> >   -> nvme_kill_queues
> >     -> blk_set_queue_dying   // just freeze the queue here, but will not wait to be drained.
> >                                 not new requests come in, but maybe still residual requests in blk-mq queues.
> >     -> blk_mq_unquiesce_queue
> >
> > the queues are _unquiesced_ here, then the residual requests will be queued
> > and go through nvme_queue_rq. Then the freed nvme_queue structure will be accessed.
> > :)
> >
> > Thanks
> > Jianchao
> 
> Hi Jianchao,
> 
> First of all, I really appreciate letting me know the case.
> It seems no one updates the actual nr_hw_queues value and frees hctxs
> after nvme_kill_queues().
> If you don't mind, would you please tell me where hctxs are freed
> after nvme_kill_queues()?

The API doesn't let us set the nr_hw_queues to 0. We'd have to free the
tagset at that point, but we don't free it until the last open reference
is dropped. I can't seem to recall why that's necessary but I'll stare
at this a bit longer see if it makes sense. The memory the driver is
holding onto is not really a big deal either way.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V2] nvme: free pre-allocated queue if create ioq goes wrong
  2018-01-18 11:31         ` Keith Busch
@ 2018-01-18 22:52           ` Minwoo Im
  0 siblings, 0 replies; 8+ messages in thread
From: Minwoo Im @ 2018-01-18 22:52 UTC (permalink / raw)


On Thu, Jan 18, 2018@8:31 PM, Keith Busch <keith.busch@intel.com> wrote:
>
> The API doesn't let us set the nr_hw_queues to 0. We'd have to free the
> tagset at that point, but we don't free it until the last open reference
> is dropped. I can't seem to recall why that's necessary but I'll stare
> at this a bit longer see if it makes sense. The memory the driver is
> holding onto is not really a big deal either way.

Thank you for your time, Keith.

I was wondering if I would have to clear the whole tagset or not.
But as you said, If the memory driver holds is not really important
and no one is
referring it then we can skip it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-01-18 22:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-14 21:00 [PATCH V2] nvme: free pre-allocated queue if create ioq goes wrong Minwoo Im
2018-01-15  2:00 ` Keith Busch
2018-01-17 13:07   ` Minwoo Im
2018-01-17 15:00   ` Minwoo Im
2018-01-18  5:27     ` jianchao.wang
2018-01-18 10:25       ` Minwoo Im
2018-01-18 11:31         ` Keith Busch
2018-01-18 22:52           ` Minwoo Im

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.