All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
@ 2021-06-28  3:14 wenxiong
  2021-06-28  9:07 ` Daniel Wagner
  0 siblings, 1 reply; 14+ messages in thread
From: wenxiong @ 2021-06-28  3:14 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-kernel, james.smart, dwagner, wenxiong, Wen Xiong

From: Wen Xiong <wenxiong@linux.vnet.ibm.com>

Error inject:

1. run hash ppc64_cpu  2>/dev/null && ppc64_cpu --smt=4
2. Disable one SVC port (at switch) down for 10 mins
3. Enable port back
4. Linux crash

System has two cores with 16 cpus like cpu0-cpu15. All cpus
are online when system boots up.
core0: cpu0-cpu7 online
core1: cpu8-cpu15 online

Issue the following cpu houplug command in ppc:
cpu0-cpu3 are online
cpu4-cpu7 are offline
cpu8-cpu11 are online
cpu12-cpu15 are offline

After this cpu hotplug operations, the state of hctx are changed:
- cpu0-cpu3(online): no change
- cpu4-cpu7(offline): mask off. The state for each hctx set to
INACTIVE, also realloc htcx for this cpu.
- cpu8-cpu11(oneline): cpus are still active but hctxs are disable
after calling realloc hctx
- cpu12-cpu15(offline):  mask off, The state for each hctx set to
INACTIVE, hctxs are disable.

From nvme/fc driver:
nvme_fc_create_association()
->nvme_fc_recreate_io_queues() if ctrl->ioq_live=ture
  ->nvme_fc_connect_io_queues()
    ->blk_mq_update_nr_hw_queues()
    ->nvme_fc_connect_io_queues()
      ->nvmf_connect_io_queue()

nvme_fc_connect_io_queues(struct nvme_fc_ctrl *ctrl, u16 qsize)
{

        for (i = 1; i < ctrl->ctrl.queue_count; i++) {

                ret = nvmf_connect_io_queue(&ctrl->ctrl, i, false);

                set_bit(NVME_FC_Q_LIVE, &ctrl->queues[i].flags);
        }

}

After cpu hotplug, i loop from 1->8, let see what's happned if pass i:
i = 1, call blk_mq_alloc_request_hctx with id = 0 ok
i = 2, call blk_mq_alloc_request_hctx with id = 1 ok
i = 3, call blk_mq_alloc_request_hctx with id = 2 ok
i = 4, call blk_mq_alloc_request_hctx with id = 3 ok
i = 5, call blk_mq_alloc_request_hctx with id = 4 crash (cpu = 2048)
i = 6, call blk_mq_alloc_request_hctx with id = 5 crash (cpu = 2048)
i = 7, call blk_mq_alloc_request_hctx with id = 6 crash (cpu = 2048)
i = 8, call blk_mq_alloc_request_hctx with id = 7 crash (cpu = 2048)

cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);

The patch fixed the crash issue when doing bouncing port on storage side + cpu hotplug.
---
 block/blk-mq-tag.c | 3 ++-
 block/blk-mq.c     | 4 +---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 2a37731e8244..b927233bb6bb 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -171,7 +171,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 	 * Give up this allocation if the hctx is inactive.  The caller will
 	 * retry on an active hctx.
 	 */
-	if (unlikely(test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state))) {
+	if (unlikely(test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state))
+			&& data->hctx->queue_num > num_online_cpus()) {
 		blk_mq_put_tag(tags, data->ctx, tag + tag_offset);
 		return BLK_MQ_NO_TAG;
 	}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c86c01bfecdb..5e31bd9b06c2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -436,7 +436,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 		.cmd_flags	= op,
 	};
 	u64 alloc_time_ns = 0;
-	unsigned int cpu;
 	unsigned int tag;
 	int ret;
 
@@ -468,8 +467,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	data.hctx = q->queue_hw_ctx[hctx_idx];
 	if (!blk_mq_hw_queue_mapped(data.hctx))
 		goto out_queue_exit;
-	cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);
-	data.ctx = __blk_mq_get_ctx(q, cpu);
+	data.ctx = __blk_mq_get_ctx(q, hctx_idx);
 
 	if (!q->elevator)
 		blk_mq_tag_busy(data.hctx);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-06-29 11:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-28  3:14 [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port wenxiong
2021-06-28  9:07 ` Daniel Wagner
2021-06-28  9:59   ` Ming Lei
     [not found]     ` <71d1ce491ed5056bfa921f0e14fa646d@imap.linux.ibm.com>
2021-06-29  1:20       ` Ming Lei
     [not found]       ` <OFE573413D.44652DC5-ON00258703.000DB949-00258703.000EFCD4@ibm.com>
2021-06-29  2:56         ` Ming Lei
     [not found]         ` <OF8889275F.DC758B38-ON00258703.001297BC-00258703.00143502@ibm.com>
2021-06-29  3:47           ` Ming Lei
2021-06-29  8:25             ` Daniel Wagner
2021-06-29  8:35               ` Daniel Wagner
2021-06-29  9:01                 ` Ming Lei
2021-06-29  9:27                   ` Daniel Wagner
2021-06-29  9:35                     ` Ming Lei
2021-06-29  9:49                       ` Daniel Wagner
2021-06-29 10:06                         ` Ming Lei
2021-06-29 11:50                           ` Daniel Wagner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.