[PATCH 1/1] block: System crashes when cpu hotplug + bouncing port

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
@ 2021-06-28  3:14 wenxiong
  2021-06-28  9:07 ` Daniel Wagner
  0 siblings, 1 reply; 14+ messages in thread
From: wenxiong @ 2021-06-28  3:14 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-kernel, james.smart, dwagner, wenxiong, Wen Xiong

From: Wen Xiong <wenxiong@linux.vnet.ibm.com>

Error inject:

1. run hash ppc64_cpu  2>/dev/null && ppc64_cpu --smt=4
2. Disable one SVC port (at switch) down for 10 mins
3. Enable port back
4. Linux crash

System has two cores with 16 cpus like cpu0-cpu15. All cpus
are online when system boots up.
core0: cpu0-cpu7 online
core1: cpu8-cpu15 online

Issue the following cpu houplug command in ppc:
cpu0-cpu3 are online
cpu4-cpu7 are offline
cpu8-cpu11 are online
cpu12-cpu15 are offline

After this cpu hotplug operations, the state of hctx are changed:
- cpu0-cpu3(online): no change
- cpu4-cpu7(offline): mask off. The state for each hctx set to
INACTIVE, also realloc htcx for this cpu.
- cpu8-cpu11(oneline): cpus are still active but hctxs are disable
after calling realloc hctx
- cpu12-cpu15(offline):  mask off, The state for each hctx set to
INACTIVE, hctxs are disable.

From nvme/fc driver:
nvme_fc_create_association()
->nvme_fc_recreate_io_queues() if ctrl->ioq_live=ture
  ->nvme_fc_connect_io_queues()
    ->blk_mq_update_nr_hw_queues()
    ->nvme_fc_connect_io_queues()
      ->nvmf_connect_io_queue()

nvme_fc_connect_io_queues(struct nvme_fc_ctrl *ctrl, u16 qsize)
{

        for (i = 1; i < ctrl->ctrl.queue_count; i++) {

                ret = nvmf_connect_io_queue(&ctrl->ctrl, i, false);

                set_bit(NVME_FC_Q_LIVE, &ctrl->queues[i].flags);
        }

}

After cpu hotplug, i loop from 1->8, let see what's happned if pass i:
i = 1, call blk_mq_alloc_request_hctx with id = 0 ok
i = 2, call blk_mq_alloc_request_hctx with id = 1 ok
i = 3, call blk_mq_alloc_request_hctx with id = 2 ok
i = 4, call blk_mq_alloc_request_hctx with id = 3 ok
i = 5, call blk_mq_alloc_request_hctx with id = 4 crash (cpu = 2048)
i = 6, call blk_mq_alloc_request_hctx with id = 5 crash (cpu = 2048)
i = 7, call blk_mq_alloc_request_hctx with id = 6 crash (cpu = 2048)
i = 8, call blk_mq_alloc_request_hctx with id = 7 crash (cpu = 2048)

cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);

The patch fixed the crash issue when doing bouncing port on storage side + cpu hotplug.
---
 block/blk-mq-tag.c | 3 ++-
 block/blk-mq.c     | 4 +---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 2a37731e8244..b927233bb6bb 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -171,7 +171,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 	 * Give up this allocation if the hctx is inactive.  The caller will
 	 * retry on an active hctx.
 	 */
-	if (unlikely(test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state))) {
+	if (unlikely(test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state))
+			&& data->hctx->queue_num > num_online_cpus()) {
 		blk_mq_put_tag(tags, data->ctx, tag + tag_offset);
 		return BLK_MQ_NO_TAG;
 	}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c86c01bfecdb..5e31bd9b06c2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -436,7 +436,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 		.cmd_flags	= op,
 	};
 	u64 alloc_time_ns = 0;
-	unsigned int cpu;
 	unsigned int tag;
 	int ret;
 
@@ -468,8 +467,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	data.hctx = q->queue_hw_ctx[hctx_idx];
 	if (!blk_mq_hw_queue_mapped(data.hctx))
 		goto out_queue_exit;
-	cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);
-	data.ctx = __blk_mq_get_ctx(q, cpu);
+	data.ctx = __blk_mq_get_ctx(q, hctx_idx);
 
 	if (!q->elevator)
 		blk_mq_tag_busy(data.hctx);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
  2021-06-28  3:14 [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port wenxiong
@ 2021-06-28  9:07 ` Daniel Wagner
  2021-06-28  9:59   ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Wagner @ 2021-06-28  9:07 UTC (permalink / raw)
  To: wenxiong; +Cc: ming.lei, linux-kernel, james.smart, wenxiong

Hi Wen,

On Sun, Jun 27, 2021 at 10:14:32PM -0500, wenxiong@linux.vnet.ibm.com wrote:
> @@ -468,8 +467,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>  	data.hctx = q->queue_hw_ctx[hctx_idx];
>  	if (!blk_mq_hw_queue_mapped(data.hctx))
>  		goto out_queue_exit;
> -	cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);
> -	data.ctx = __blk_mq_get_ctx(q, cpu);
> +	data.ctx = __blk_mq_get_ctx(q, hctx_idx);

hctx_idx is just an index, not a CPU id. In this scenario, the hctx_idx
used to lookup the context happens to be valid. I am still a bit
confused why [1] doesn't work for this scenario.

As Ming pointed out in [2] we need to update cpumask for CPU hotplug
events.

Thanks,
Daniel

[1] https://lore.kernel.org/linux-block/20210608183339.70609-1-dwagner@suse.de/
[2] https://lore.kernel.org/linux-nvme/YNXTaUMAFCA84jfZ@T590/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
  2021-06-28  9:07 ` Daniel Wagner
@ 2021-06-28  9:59   ` Ming Lei
       [not found]     ` <71d1ce491ed5056bfa921f0e14fa646d@imap.linux.ibm.com>
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2021-06-28  9:59 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: wenxiong, linux-kernel, james.smart, wenxiong, sagi

On Mon, Jun 28, 2021 at 11:07:03AM +0200, Daniel Wagner wrote:
> Hi Wen,
> 
> On Sun, Jun 27, 2021 at 10:14:32PM -0500, wenxiong@linux.vnet.ibm.com wrote:
> > @@ -468,8 +467,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> >  	data.hctx = q->queue_hw_ctx[hctx_idx];
> >  	if (!blk_mq_hw_queue_mapped(data.hctx))
> >  		goto out_queue_exit;
> > -	cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);
> > -	data.ctx = __blk_mq_get_ctx(q, cpu);
> > +	data.ctx = __blk_mq_get_ctx(q, hctx_idx);
> 
> hctx_idx is just an index, not a CPU id. In this scenario, the hctx_idx
> used to lookup the context happens to be valid. I am still a bit
> confused why [1] doesn't work for this scenario.

[1] is fine from blk-mq viewpoint, but nvme needs to improve the
failure handling, otherwise no io queues may be connected in the
worst case.

> 
> As Ming pointed out in [2] we need to update cpumask for CPU hotplug

I mention there is still hole with your patch, not mean we need to
update cpumask.

The root cause is that blk-mq doesn't work well on tag allocation from
specified hctx(blk_mq_alloc_request_hctx), and blk-mq assumes that any
request allocation can't cross hctx inactive/offline, see blk_mq_hctx_notify_offline()
and blk_mq_get_tag(). Either the allocated request is completed or new
allocation is prevented before the current hctx becomes inactive(any CPU in
hctx->cpumask is offline).

I tried[1] to move connecting io queue into driver and kill blk_mq_alloc_request_hctx()
for addressing this issue, but there is corner case(timeout) not covered.

I understand that NVMe's requirement is that connect io queue should be
done successfully no matter if the hctx is inactive or not. Sagi,
connect me if I am wrong.

[1]
https://lore.kernel.org/linux-block/fda43a50-a484-dde7-84a1-94ccf9346bdd@broadcom.com/T/#m1e902f69e8503f5e6202945b8b79e5b7252e3689

Thanks,
Ming

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
       [not found]     ` <71d1ce491ed5056bfa921f0e14fa646d@imap.linux.ibm.com>
@ 2021-06-29  1:20       ` Ming Lei
       [not found]       ` <OFE573413D.44652DC5-ON00258703.000DB949-00258703.000EFCD4@ibm.com>
  1 sibling, 0 replies; 14+ messages in thread
From: Ming Lei @ 2021-06-29  1:20 UTC (permalink / raw)
  To: wenxiong; +Cc: Daniel Wagner, linux-kernel, james.smart, wenxiong, sagi

Hi Wenxiong,

On Mon, Jun 28, 2021 at 01:17:34PM -0500, wenxiong wrote:
> 
> > 
> > The root cause is that blk-mq doesn't work well on tag allocation from
> > specified hctx(blk_mq_alloc_request_hctx), and blk-mq assumes that any
> > request allocation can't cross hctx inactive/offline, see
> > blk_mq_hctx_notify_offline()
> 
> Hi Ming,
> 
> I tried to pass online cpu_id(like cpu=8 in my case) to
> blk_mq_alloc_request_hctx(),
> data.hctx = q->queue_hw_ctx[hctx_idx];
> but looks like data.hctx returned with NULL. So system crashed if accessing
> data.hctx later.
> 
> blk-mq request allocation can't cross hctx inactive/offline but blk-mq still
> reallocate the hctx for offline cpus(like cpu=4,5,6,7 in my case) in
> blk_mq_realloc_hw_ctxs() and hctx are NULL for online(cpu=8 in my case)cpus.
> 
> Below is my understanding for hctxs, please correct me if I am wrong:
> 
> Assume a system has two cores with 16 cpus:
> Before doing cpu hot plug events:
> cpu0-cpu7(core 0) : hctx->state is ACTIVE and q->hctx is not NULL.
> cpu8-cpu15(core 1): hctx->state is ACTIVE and q->hctx is not NULL
> 
> After doing cpu hot plug events(the second half of each core are offline)
> cpu0-cpu3: online, hctx->state is ACTIVE and q->hctx is not NULL.
> cpu4-cpu7: offline,hctx->state is INACTIVE and q->hctx is not NULL
> cpu8-cpu11: online, hctx->state is ACTIVE but q->hctx = NULL
> cpu12-cpu15:offline, hctx->state is INACTIVE and q->hctx = NULL
> 
> So num_online_cpus() is 8 after cpu hotplug events. Either way not working
> for me, no matter I pass 8 online cpus or 4 online/4 offline cpus.
> 
> Is this correct? If nvmf pass online cpu ids to blk-mq, why it still
> crashes/fails?

NVMe users have to pass correct hctx_idx to blk_mq_alloc_request_hctx(), but
from the info you provided, they don't provide valid hctx_idx to blk-mq, so
q->queue_hw_ctx[hctx_idx] is NULL and kernel panic.

I believe Daniel's following patch may fix this specific issue if your
controller is FC:

[1] https://lore.kernel.org/linux-nvme/YNXTaUMAFCA84jfZ@T590/T/#t


Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
       [not found]       ` <OFE573413D.44652DC5-ON00258703.000DB949-00258703.000EFCD4@ibm.com>
@ 2021-06-29  2:56         ` Ming Lei
       [not found]         ` <OF8889275F.DC758B38-ON00258703.001297BC-00258703.00143502@ibm.com>
  1 sibling, 0 replies; 14+ messages in thread
From: Ming Lei @ 2021-06-29  2:56 UTC (permalink / raw)
  To: Wen Xiong; +Cc: dwagner, james.smart, linux-kernel, sagi

Hi Wen Xiong,

On Tue, Jun 29, 2021 at 02:43:42AM +0000, Wen Xiong wrote:
>    >>NVMe users have to pass correct hctx_idx to blk_mq_alloc_request_hctx(),
>    but
>    >>from the info you provided, they don't provide valid hctx_idx to blk-mq,
>    so
>    >>q->queue_hw_ctx[hctx_idx] is NULL and kernel panic.
>     
>    Hi Ming,
>     
>    Daniel's two patches didn't fix the crash issue. My patch is on top of two
>    patches.
>    That is the reason why I am continue debugging the issue.

Can you provide the dmesg log after applying Daniel's patches?

Yeah, one known issue is that the following line in blk_mq_alloc_request_hctx()
won't work well even though Daniel's patches are applied:

	data.ctx = __blk_mq_get_ctx(q, cpu);

Is that the kernel crash in your observation?

>     
>    What  hctx_idx you suggest to provide to blk-mq for this issue?
>     
>    Before cpu hotplug, num_online_cpus() is 16: 0-15 are online.
>    After cpu hotplug, num_online_cpus() is 8: 0,1,2,3,8,9, 10,11 are online
>    4,5,6,7,12,13,14,15 are offline.
>     
>    What hctx_idx you suggest to provide to blk-mq by calling
>    blk_mq_alloc_request_hctx() in this case?

At least the hctx_idx shouldn't be >= q->nr_hw_queues/set->nr_hw_queues.

Also can you collect the queue mapping log?

#./dump-qmap /dev/nvme1n1


[1] http://people.redhat.com/minlei/tests/tools/dump-qmap


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
       [not found]         ` <OF8889275F.DC758B38-ON00258703.001297BC-00258703.00143502@ibm.com>
@ 2021-06-29  3:47           ` Ming Lei
  2021-06-29  8:25             ` Daniel Wagner
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2021-06-29  3:47 UTC (permalink / raw)
  To: Wen Xiong; +Cc: dwagner, james.smart, linux-kernel, sagi, wenxiong

On Tue, Jun 29, 2021 at 03:40:42AM +0000, Wen Xiong wrote:
>    >>Can you provide the dmesg log after applying Daniel's patches?
>     
>    Below is crash kernel stack with Daniel's patches(I got this last week). I
>    can get full dmesg for you tomorrow(late here)
>     
> 
>  root @ ltcfleet1-lp8: /root
>  # [  596.448859] Oops: Kernel access of bad area, sig: 11 [#1]
>  [  596.448871] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>  [  596.448879] Modules linked in: rpadlpar_io(E) rpaphp(E) xsk_diag(E) tcp_diag(E) udp_diag(E) raw_diag(E) inet_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) xt_tcpudp(E) iptable_filter(E) ip_tables(E) x_tables(E) af_packet(E) rfkill(E) xfs(E) dm_service_time(E) ibmveth(E) vmx_crypto(E) gf128mul(E) crct10dif_vpmsum(E) uio_pdrv_genirq(E) uio(E) rtc_generic(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) backlight(E) agpgart(E) btrfs(E) blake2b_generic(E) libcrc32c(E) xor(E) raid6_pq(E) sd_mod(E) ibmvscsi(E) scsi_transport_srp(E) lpfc(E) nvmet_fc(E) nvmet(E) configfs(E) nvme_fc(E) nvme_fabrics(E) crc32c_vpmsum(E) nvme_core(E) t10_pi(E) scsi_transport_fc(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E)
>  [  596.448973] CPU: 2 PID: 2665 Comm: kworker/u32:5 Kdump: loaded Tainted: G            E     5.13.0-rc6-50-default+ #11
>  [  596.448982] Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc]
>  [  596.448990] NIP:  c000000000747690 LR: c000000000747a80 CTR: c0000000008451f0
>  [  596.448997] REGS: c000000034ad3510 TRAP: 0380   Tainted: G            E      (5.13.0-rc6-50-default+)
>  [  596.449004] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 48002224  XER: 00000000
>  [  596.449021] CFAR: c000000000747a7c IRQMASK: 0
>  [  596.449021] GPR00: c000000000747a80 c000000034ad37b0 c000000001982900 c00000002d39b280
>  [  596.449021] GPR04: 8009fff80c295c00 0000000000000000 0000000000000000 0000000000000001
>  [  596.449021] GPR08: 00000007faf50000 0000000000000000 0000000000000000 0000000000000001
>  [  596.449021] GPR12: 0000000000002000 c000000007fcd680 c000000000188608 c0000007ffa9b200
>  [  596.449021] GPR16: 0000000000000190 0000000000000600 0000000000001f00 0000000000000000
>  [  596.449021] GPR20: 0000000000000004 0700000048000000 c00000002abe0460 0000000000000000
>  [  596.449021] GPR24: c00000002cd45400 0000000000000400 0000000000000000 c00000002d39b280
>  [  596.449021] GPR28: fffffffffffffff5 c00000002d39b2d8 0000000000000000 c000000034ad38c0
>  [  596.449097] NIP [c000000000747690] blk_mq_put_tag+0x20/0x90
>  [  596.449104] LR [c000000000747a80] blk_mq_get_tag+0x380/0x3b0
>  [  596.449111] Call Trace:
>  [  596.449114] [c000000034ad37b0] [c000000000746e64] __blk_mq_get_tag+0x44/0x160 (unreliable)
>  [  596.449124] [c000000034ad37d0] [c000000000747a80] blk_mq_get_tag+0x380/0x3b0
>  [  596.449132] [c000000034ad3860] [c00000000073e24c] blk_mq_alloc_request_hctx+0x1bc/0x240
>  [  596.449140] [c000000034ad3920] [c0080000003945c8] __nvme_submit_sync_cmd+0xa0/0x350 [nvme_core]
>  [  596.449154] [c000000034ad39c0] [c0080000004117b0] nvmf_connect_io_queue+0x148/0x1f0 [nvme_fabrics]
>  [  596.449164] [c000000034ad3ab0] [c008000000377ab0] nvme_fc_connect_io_queues+0x2d8/0x36c [nvme_fc]
>  [  596.449174] [c000000034ad3b90] [c00800000037666c] nvme_fc_connect_ctrl_work+0xa44/0xc30 [nvme_fc]
>  [  596.449183] [c000000034ad3c60] [c00000000017d914] process_one_work+0x2f4/0x610
>  [  596.449191] [c000000034ad3d00] [c00000000017dcb8] worker_thread+0x88/0x6a0
>  [  596.449199] [c000000034ad3da0] [c00000000018879c] kthread+0x19c/0x1b0
>  [  596.449207] [c000000034ad3e10] [c00000000000d6ec] ret_from_kernel_thread+0x5c/0x70
>  [  596.449215] Instruction dump:
>  [  596.449220] e8010030 38210020 7c0803a6 4e800020 3c4c0124 3842b290 7c0802a6 60000000
>  [  596.449233] 7c0802a6 7ca92b78 f8010010 f821ffe1 <80a40080> 80830004 7f892040 419c0044
>  [  596.449246] ---[ end trace c4525398a0ebf111 ]---
>  [  596.470968]
>  [    0.000000] hash-mmu: Page sizes from device-tree:
>  [    0.000000] hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000,
> 
>    >>data.ctx = __blk_mq_get_ctx(q, cpu);
>    cpu=2048 if hctx_idx = 4

Yeah, that is the issue I mentioned, any CPU in hctx->cpumask becomes
offline, please try the following patch and see if it makes a
difference:


diff --git a/block/blk-mq.c b/block/blk-mq.c
index df5dc3b756f5..74632f50d969 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -494,7 +494,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	data.hctx = q->queue_hw_ctx[hctx_idx];
 	if (!blk_mq_hw_queue_mapped(data.hctx))
 		goto out_queue_exit;
-	cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);
+	cpu = cpumask_first(data.hctx->cpumask);
 	data.ctx = __blk_mq_get_ctx(q, cpu);
 
 	if (!q->elevator)
@@ -2570,6 +2570,10 @@ static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node)
 	    !blk_mq_last_cpu_in_hctx(cpu, hctx))
 		return 0;
 
+	/* Controller doesn't use managed IRQ, no need to deactivate hctx */
+	if (hctx->flags & BLK_MQ_F_NOT_USE_MANAGED_IRQ)
+		return 0;
+
 	/*
 	 * Prevent new request from being allocated on the current hctx.
 	 *
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 256e87721a01..c563a2b6e9fc 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2876,7 +2876,8 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
 	ctrl->tag_set.queue_depth = ctrl->ctrl.opts->queue_size;
 	ctrl->tag_set.reserved_tags = NVMF_RESERVED_TAGS;
 	ctrl->tag_set.numa_node = ctrl->ctrl.numa_node;
-	ctrl->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+	ctrl->tag_set.flags = BLK_MQ_F_SHOULD_MERGE |
+		BLK_MQ_F_NOT_USE_MANAGED_IRQ;
 	ctrl->tag_set.cmd_size =
 		struct_size((struct nvme_fcp_op_w_sgl *)NULL, priv,
 			    ctrl->lport->ops->fcprqst_priv_sz);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 21140132a30d..600c5dd1a069 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -403,6 +403,7 @@ enum {
 	 */
 	BLK_MQ_F_STACKING	= 1 << 2,
 	BLK_MQ_F_TAG_HCTX_SHARED = 1 << 3,
+	BLK_MQ_F_NOT_USE_MANAGED_IRQ = 1 << 4,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,

Thanks,
Ming


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
  2021-06-29  3:47           ` Ming Lei
@ 2021-06-29  8:25             ` Daniel Wagner
  2021-06-29  8:35               ` Daniel Wagner
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Wagner @ 2021-06-29  8:25 UTC (permalink / raw)
  To: Ming Lei; +Cc: Wen Xiong, james.smart, linux-kernel, sagi, wenxiong

On Tue, Jun 29, 2021 at 11:47:55AM +0800, Ming Lei wrote:
> >    >>data.ctx = __blk_mq_get_ctx(q, cpu);
> >    cpu=2048 if hctx_idx = 4
> 
> Yeah, that is the issue I mentioned, any CPU in hctx->cpumask becomes
> offline, please try the following patch and see if it makes a
> difference:

Given that cpumask_first_and() will return nr_cpu_ids in this case,
can't we just bail out here and have to caller handle the error? I am
able to reproduce the crash you reported in [1] and the fix [2] here
works for me:

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -469,6 +469,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
        if (!blk_mq_hw_queue_mapped(data.hctx))
                goto out_queue_exit;
        cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);
+       if (cpu >= nr_cpu_ids)
+               goto out_queue_exit;
        data.ctx = __blk_mq_get_ctx(q, cpu);
 
        if (!q->elevator)

[1] https://lore.kernel.org/linux-block/20191117041233.GA30615@ming.t460p/
[2] https://lore.kernel.org/linux-block/20210608183339.70609-1-dwagner@suse.de/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
  2021-06-29  8:25             ` Daniel Wagner
@ 2021-06-29  8:35               ` Daniel Wagner
  2021-06-29  9:01                 ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Wagner @ 2021-06-29  8:35 UTC (permalink / raw)
  To: Ming Lei; +Cc: Wen Xiong, james.smart, linux-kernel, sagi, wenxiong

On Tue, Jun 29, 2021 at 10:25:43AM +0200, Daniel Wagner wrote:
> On Tue, Jun 29, 2021 at 11:47:55AM +0800, Ming Lei wrote:
> > >    >>data.ctx = __blk_mq_get_ctx(q, cpu);
> > >    cpu=2048 if hctx_idx = 4
> > 
> > Yeah, that is the issue I mentioned, any CPU in hctx->cpumask becomes
> > offline, please try the following patch and see if it makes a
> > difference:
> 
> Given that cpumask_first_and() will return nr_cpu_ids in this case,
> can't we just bail out here and have to caller handle the error?

To answer my own question, you want to avoid adding the if into the
hotpath.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
  2021-06-29  8:35               ` Daniel Wagner
@ 2021-06-29  9:01                 ` Ming Lei
  2021-06-29  9:27                   ` Daniel Wagner
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2021-06-29  9:01 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Wen Xiong, james.smart, linux-kernel, sagi, wenxiong

On Tue, Jun 29, 2021 at 10:35:49AM +0200, Daniel Wagner wrote:
> On Tue, Jun 29, 2021 at 10:25:43AM +0200, Daniel Wagner wrote:
> > On Tue, Jun 29, 2021 at 11:47:55AM +0800, Ming Lei wrote:
> > > >    >>data.ctx = __blk_mq_get_ctx(q, cpu);
> > > >    cpu=2048 if hctx_idx = 4
> > > 
> > > Yeah, that is the issue I mentioned, any CPU in hctx->cpumask becomes
> > > offline, please try the following patch and see if it makes a
> > > difference:
> > 
> > Given that cpumask_first_and() will return nr_cpu_ids in this case,
> > can't we just bail out here and have to caller handle the error?
> 
> To answer my own question, you want to avoid adding the if into the
> hotpath.

No, this way fails the request allocation, which isn't expected from
NVMe fc/rdma/tcp/loop, since io queue can't be connected in this way.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
  2021-06-29  9:01                 ` Ming Lei
@ 2021-06-29  9:27                   ` Daniel Wagner
  2021-06-29  9:35                     ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Wagner @ 2021-06-29  9:27 UTC (permalink / raw)
  To: Ming Lei; +Cc: Wen Xiong, james.smart, linux-kernel, sagi, wenxiong

On Tue, Jun 29, 2021 at 05:01:16PM +0800, Ming Lei wrote:
> No, this way fails the request allocation, which isn't expected from
> NVMe fc/rdma/tcp/loop, since io queue can't be connected in this way.

But __nvme_submit_sync_cmd() can return errors too and they need to be
handled in the connect path. So why is this so special? Not that I am
against your patch, I just like to understand the reasoning.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
  2021-06-29  9:27                   ` Daniel Wagner
@ 2021-06-29  9:35                     ` Ming Lei
  2021-06-29  9:49                       ` Daniel Wagner
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2021-06-29  9:35 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Wen Xiong, james.smart, linux-kernel, sagi, wenxiong

On Tue, Jun 29, 2021 at 11:27:19AM +0200, Daniel Wagner wrote:
> On Tue, Jun 29, 2021 at 05:01:16PM +0800, Ming Lei wrote:
> > No, this way fails the request allocation, which isn't expected from
> > NVMe fc/rdma/tcp/loop, since io queue can't be connected in this way.
> 
> But __nvme_submit_sync_cmd() can return errors too and they need to be
> handled in the connect path. So why is this so special? Not that I am
> against your patch, I just like to understand the reasoning.

With the two patches I posted, __nvme_submit_sync_cmd() shouldn't return
error, can you observe the error?


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
  2021-06-29  9:35                     ` Ming Lei
@ 2021-06-29  9:49                       ` Daniel Wagner
  2021-06-29 10:06                         ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Wagner @ 2021-06-29  9:49 UTC (permalink / raw)
  To: Ming Lei; +Cc: Wen Xiong, james.smart, linux-kernel, sagi, wenxiong

On Tue, Jun 29, 2021 at 05:35:51PM +0800, Ming Lei wrote:
> With the two patches I posted, __nvme_submit_sync_cmd() shouldn't return
> error, can you observe the error?

There are still ways the allocation can fail:

        ret = blk_queue_enter(q, flags);
        if (ret)
                return ERR_PTR(ret);

        ret = -EXDEV;
        data.hctx = q->queue_hw_ctx[hctx_idx];
        if (!blk_mq_hw_queue_mapped(data.hctx))
                goto out_queue_exit;

No, I don't see any errors. I am still trying to reproduce it on real
hardware. The setup with blktests running in Qemu did work with all
patches applied (the once from me and your patches).

About the error argument: Later in the code path, e.g. in
__nvme_submit_sync_cmd() transport errors (incl. canceled request) are
handled as well, hence the upper layer will see errors during connection
attempts. My point is, there is nothing special about the connection
attempt failing. We have error handling code in place and the above
state machine has to deal with it.

Anyway, avoiding the if in the hotpath is a good thing. I just don't
think your argument about no error can happen is correct.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
  2021-06-29  9:49                       ` Daniel Wagner
@ 2021-06-29 10:06                         ` Ming Lei
  2021-06-29 11:50                           ` Daniel Wagner
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2021-06-29 10:06 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Wen Xiong, james.smart, linux-kernel, sagi, wenxiong

On Tue, Jun 29, 2021 at 11:49:38AM +0200, Daniel Wagner wrote:
> On Tue, Jun 29, 2021 at 05:35:51PM +0800, Ming Lei wrote:
> > With the two patches I posted, __nvme_submit_sync_cmd() shouldn't return
> > error, can you observe the error?
> 
> There are still ways the allocation can fail:
> 
>         ret = blk_queue_enter(q, flags);
>         if (ret)
>                 return ERR_PTR(ret);
> 
>         ret = -EXDEV;
>         data.hctx = q->queue_hw_ctx[hctx_idx];
>         if (!blk_mq_hw_queue_mapped(data.hctx))
>                 goto out_queue_exit;

The above failure is supposed to be handled as error, either queue is
frozen or hctx is unmapped.

> 
> No, I don't see any errors. I am still trying to reproduce it on real
> hardware. The setup with blktests running in Qemu did work with all
> patches applied (the once from me and your patches).
> 
> About the error argument: Later in the code path, e.g. in
> __nvme_submit_sync_cmd() transport errors (incl. canceled request) are
> handled as well, hence the upper layer will see errors during connection
> attempts. My point is, there is nothing special about the connection
> attempt failing. We have error handling code in place and the above
> state machine has to deal with it.

My two patches not only avoids the kernel panic, but also allow
request to be allocated successfully, then connect io queue request can
be submitted to driver even though all CPUs in hctx->cpumask is offline,
then nvmef can be setup well.

That is the difference with yours to fail the request allocation, then
connect io queues can't be done, and the whole host can't be setup
successfully, then become a brick. The point is that cpu offline shouldn't
fail to setup nvme fc/rdma/tcp/loop.

> 
> Anyway, avoiding the if in the hotpath is a good thing. I just don't
> think your argument about no error can happen is correct.

Again, it isn't related with avoiding the if, and it isn't in hotpath
at all.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port
  2021-06-29 10:06                         ` Ming Lei
@ 2021-06-29 11:50                           ` Daniel Wagner
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Wagner @ 2021-06-29 11:50 UTC (permalink / raw)
  To: Ming Lei; +Cc: Wen Xiong, james.smart, linux-kernel, sagi, wenxiong

On Tue, Jun 29, 2021 at 06:06:21PM +0800, Ming Lei wrote:
> > No, I don't see any errors. I am still trying to reproduce it on real
> > hardware. The setup with blktests running in Qemu did work with all
> > patches applied (the once from me and your patches).
> > 
> > About the error argument: Later in the code path, e.g. in
> > __nvme_submit_sync_cmd() transport errors (incl. canceled request) are
> > handled as well, hence the upper layer will see errors during connection
> > attempts. My point is, there is nothing special about the connection
> > attempt failing. We have error handling code in place and the above
> > state machine has to deal with it.
> 
> My two patches not only avoids the kernel panic, but also allow
> request to be allocated successfully, then connect io queue request can
> be submitted to driver even though all CPUs in hctx->cpumask is offline,
> then nvmef can be setup well.
> 
> That is the difference with yours to fail the request allocation, then
> connect io queues can't be done, and the whole host can't be setup
> successfully, then become a brick. The point is that cpu offline shouldn't
> fail to setup nvme fc/rdma/tcp/loop.

Right, I think I see your point now.

> > Anyway, avoiding the if in the hotpath is a good thing. I just don't
> > think your argument about no error can happen is correct.
> 
> Again, it isn't related with avoiding the if, and it isn't in hotpath
> at all.

I mixed up blk_mq_alloc_request() with blk_mq_alloc_request_hctx().

Thanks for the explanation. I'll keep trying to replicated the problem
on real hardware and see if these patches mitigate it.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-06-29 11:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-28  3:14 [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port wenxiong
2021-06-28  9:07 ` Daniel Wagner
2021-06-28  9:59   ` Ming Lei
     [not found]     ` <71d1ce491ed5056bfa921f0e14fa646d@imap.linux.ibm.com>
2021-06-29  1:20       ` Ming Lei
     [not found]       ` <OFE573413D.44652DC5-ON00258703.000DB949-00258703.000EFCD4@ibm.com>
2021-06-29  2:56         ` Ming Lei
     [not found]         ` <OF8889275F.DC758B38-ON00258703.001297BC-00258703.00143502@ibm.com>
2021-06-29  3:47           ` Ming Lei
2021-06-29  8:25             ` Daniel Wagner
2021-06-29  8:35               ` Daniel Wagner
2021-06-29  9:01                 ` Ming Lei
2021-06-29  9:27                   ` Daniel Wagner
2021-06-29  9:35                     ` Ming Lei
2021-06-29  9:49                       ` Daniel Wagner
2021-06-29 10:06                         ` Ming Lei
2021-06-29 11:50                           ` Daniel Wagner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.