From: Ming Lei <ming.lei@redhat.com> To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@fb.com>, linux-block@vger.kernel.org Cc: Bart Van Assche <bart.vanassche@sandisk.com>, Keith Busch <keith.busch@intel.com>, Sagi Grimberg <sagi@grimberg.me>, Yi Zhang <yi.zhang@redhat.com>, Johannes Thumshirn <jthumshirn@suse.de>, Ming Lei <ming.lei@redhat.com> Subject: [PATCH V2 4/6] blk-mq: avoid to map CPU into stale hw queue Date: Thu, 14 Dec 2017 10:31:01 +0800 [thread overview] Message-ID: <20171214023103.18272-5-ming.lei@redhat.com> (raw) In-Reply-To: <20171214023103.18272-1-ming.lei@redhat.com> blk_mq_pci_map_queues() may not map one CPU into any hw queue, but its previous map isn't cleared yet, and may point to one stale hw queue index. This patch fixes the following issue by clearing the mapping table before setting it up in blk_mq_pci_map_queues(). This patches fixes this following issue reported by Zhang Yi: [ 101.202734] BUG: unable to handle kernel NULL pointer dereference at 0000000094d3013f [ 101.211487] IP: blk_mq_map_swqueue+0xbc/0x200 [ 101.216346] PGD 0 P4D 0 [ 101.219171] Oops: 0000 [#1] SMP [ 101.222674] Modules linked in: sunrpc ipmi_ssif vfat fat intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore mxm_wmi intel_rapl_perf iTCO_wdt ipmi_si ipmi_devintf pcspkr iTCO_vendor_support sg dcdbas ipmi_msghandler wmi mei_me lpc_ich shpchp mei acpi_power_meter dm_multipath ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci crc32c_intel libata tg3 nvme nvme_core megaraid_sas ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod [ 101.284881] CPU: 0 PID: 504 Comm: kworker/u25:5 Not tainted 4.15.0-rc2 #1 [ 101.292455] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017 [ 101.301001] Workqueue: nvme-wq nvme_reset_work [nvme] [ 101.306636] task: 00000000f2c53190 task.stack: 000000002da874f9 [ 101.313241] RIP: 0010:blk_mq_map_swqueue+0xbc/0x200 [ 101.318681] RSP: 0018:ffffc9000234fd70 EFLAGS: 00010282 [ 101.324511] RAX: ffff88047ffc9480 RBX: ffff88047e130850 RCX: 0000000000000000 [ 101.332471] RDX: ffffe8ffffd40580 RSI: ffff88047e509b40 RDI: ffff88046f37a008 [ 101.340432] RBP: 000000000000000b R08: ffff88046f37a008 R09: 0000000011f94280 [ 101.348392] R10: ffff88047ffd4d00 R11: 0000000000000000 R12: ffff88046f37a008 [ 101.356353] R13: ffff88047e130f38 R14: 000000000000000b R15: ffff88046f37a558 [ 101.364314] FS: 0000000000000000(0000) GS:ffff880277c00000(0000) knlGS:0000000000000000 [ 101.373342] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 101.379753] CR2: 0000000000000098 CR3: 000000047f409004 CR4: 00000000001606f0 [ 101.387714] Call Trace: [ 101.390445] blk_mq_update_nr_hw_queues+0xbf/0x130 [ 101.395791] nvme_reset_work+0x6f4/0xc06 [nvme] [ 101.400848] ? pick_next_task_fair+0x290/0x5f0 [ 101.405807] ? __switch_to+0x1f5/0x430 [ 101.409988] ? put_prev_entity+0x2f/0xd0 [ 101.414365] process_one_work+0x141/0x340 [ 101.418836] worker_thread+0x47/0x3e0 [ 101.422921] kthread+0xf5/0x130 [ 101.426424] ? rescuer_thread+0x380/0x380 [ 101.430896] ? kthread_associate_blkcg+0x90/0x90 [ 101.436048] ret_from_fork+0x1f/0x30 [ 101.440034] Code: 48 83 3c ca 00 0f 84 2b 01 00 00 48 63 cd 48 8b 93 10 01 00 00 8b 0c 88 48 8b 83 20 01 00 00 4a 03 14 f5 60 04 af 81 48 8b 0c c8 <48> 8b 81 98 00 00 00 f0 4c 0f ab 30 8b 81 f8 00 00 00 89 42 44 [ 101.461116] RIP: blk_mq_map_swqueue+0xbc/0x200 RSP: ffffc9000234fd70 [ 101.468205] CR2: 0000000000000098 [ 101.471907] ---[ end trace 5fe710f98228a3ca ]--- [ 101.482489] Kernel panic - not syncing: Fatal exception [ 101.488505] Kernel Offset: disabled [ 101.497752] ---[ end Kernel panic - not syncing: Fatal exception Suggested-by: Christoph Hellwig <hch@lst.de> Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-mq.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 85954a0b4394..6dae8f631253 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2633,9 +2633,27 @@ static int blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set) static int blk_mq_update_queue_map(struct blk_mq_tag_set *set) { - if (set->ops->map_queues) + if (set->ops->map_queues) { + int cpu; + /* + * transport .map_queues is usually done in the following + * way: + * + * for (queue = 0; queue < set->nr_hw_queues; queue++) { + * mask = get_cpu_mask(queue) + * for_each_cpu(cpu, mask) + * set->mq_map[cpu] = queue; + * } + * + * When we need to remap, the table has to be cleared for + * killing stale mapping since one CPU may not be mapped + * to any hw queue. + */ + for_each_possible_cpu(cpu) + set->mq_map[cpu] = 0; + return set->ops->map_queues(set); - else + } else return blk_mq_map_queues(set); } -- 2.9.5
WARNING: multiple messages have this Message-ID (diff)
From: ming.lei@redhat.com (Ming Lei) Subject: [PATCH V2 4/6] blk-mq: avoid to map CPU into stale hw queue Date: Thu, 14 Dec 2017 10:31:01 +0800 [thread overview] Message-ID: <20171214023103.18272-5-ming.lei@redhat.com> (raw) In-Reply-To: <20171214023103.18272-1-ming.lei@redhat.com> blk_mq_pci_map_queues() may not map one CPU into any hw queue, but its previous map isn't cleared yet, and may point to one stale hw queue index. This patch fixes the following issue by clearing the mapping table before setting it up in blk_mq_pci_map_queues(). This patches fixes this following issue reported by Zhang Yi: [ 101.202734] BUG: unable to handle kernel NULL pointer dereference at 0000000094d3013f [ 101.211487] IP: blk_mq_map_swqueue+0xbc/0x200 [ 101.216346] PGD 0 P4D 0 [ 101.219171] Oops: 0000 [#1] SMP [ 101.222674] Modules linked in: sunrpc ipmi_ssif vfat fat intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore mxm_wmi intel_rapl_perf iTCO_wdt ipmi_si ipmi_devintf pcspkr iTCO_vendor_support sg dcdbas ipmi_msghandler wmi mei_me lpc_ich shpchp mei acpi_power_meter dm_multipath ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci crc32c_intel libata tg3 nvme nvme_core megaraid_sas ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod [ 101.284881] CPU: 0 PID: 504 Comm: kworker/u25:5 Not tainted 4.15.0-rc2 #1 [ 101.292455] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017 [ 101.301001] Workqueue: nvme-wq nvme_reset_work [nvme] [ 101.306636] task: 00000000f2c53190 task.stack: 000000002da874f9 [ 101.313241] RIP: 0010:blk_mq_map_swqueue+0xbc/0x200 [ 101.318681] RSP: 0018:ffffc9000234fd70 EFLAGS: 00010282 [ 101.324511] RAX: ffff88047ffc9480 RBX: ffff88047e130850 RCX: 0000000000000000 [ 101.332471] RDX: ffffe8ffffd40580 RSI: ffff88047e509b40 RDI: ffff88046f37a008 [ 101.340432] RBP: 000000000000000b R08: ffff88046f37a008 R09: 0000000011f94280 [ 101.348392] R10: ffff88047ffd4d00 R11: 0000000000000000 R12: ffff88046f37a008 [ 101.356353] R13: ffff88047e130f38 R14: 000000000000000b R15: ffff88046f37a558 [ 101.364314] FS: 0000000000000000(0000) GS:ffff880277c00000(0000) knlGS:0000000000000000 [ 101.373342] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 101.379753] CR2: 0000000000000098 CR3: 000000047f409004 CR4: 00000000001606f0 [ 101.387714] Call Trace: [ 101.390445] blk_mq_update_nr_hw_queues+0xbf/0x130 [ 101.395791] nvme_reset_work+0x6f4/0xc06 [nvme] [ 101.400848] ? pick_next_task_fair+0x290/0x5f0 [ 101.405807] ? __switch_to+0x1f5/0x430 [ 101.409988] ? put_prev_entity+0x2f/0xd0 [ 101.414365] process_one_work+0x141/0x340 [ 101.418836] worker_thread+0x47/0x3e0 [ 101.422921] kthread+0xf5/0x130 [ 101.426424] ? rescuer_thread+0x380/0x380 [ 101.430896] ? kthread_associate_blkcg+0x90/0x90 [ 101.436048] ret_from_fork+0x1f/0x30 [ 101.440034] Code: 48 83 3c ca 00 0f 84 2b 01 00 00 48 63 cd 48 8b 93 10 01 00 00 8b 0c 88 48 8b 83 20 01 00 00 4a 03 14 f5 60 04 af 81 48 8b 0c c8 <48> 8b 81 98 00 00 00 f0 4c 0f ab 30 8b 81 f8 00 00 00 89 42 44 [ 101.461116] RIP: blk_mq_map_swqueue+0xbc/0x200 RSP: ffffc9000234fd70 [ 101.468205] CR2: 0000000000000098 [ 101.471907] ---[ end trace 5fe710f98228a3ca ]--- [ 101.482489] Kernel panic - not syncing: Fatal exception [ 101.488505] Kernel Offset: disabled [ 101.497752] ---[ end Kernel panic - not syncing: Fatal exception Suggested-by: Christoph Hellwig <hch at lst.de> Reported-by: Yi Zhang <yi.zhang at redhat.com> Signed-off-by: Ming Lei <ming.lei at redhat.com> --- block/blk-mq.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 85954a0b4394..6dae8f631253 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2633,9 +2633,27 @@ static int blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set) static int blk_mq_update_queue_map(struct blk_mq_tag_set *set) { - if (set->ops->map_queues) + if (set->ops->map_queues) { + int cpu; + /* + * transport .map_queues is usually done in the following + * way: + * + * for (queue = 0; queue < set->nr_hw_queues; queue++) { + * mask = get_cpu_mask(queue) + * for_each_cpu(cpu, mask) + * set->mq_map[cpu] = queue; + * } + * + * When we need to remap, the table has to be cleared for + * killing stale mapping since one CPU may not be mapped + * to any hw queue. + */ + for_each_possible_cpu(cpu) + set->mq_map[cpu] = 0; + return set->ops->map_queues(set); - else + } else return blk_mq_map_queues(set); } -- 2.9.5
next prev parent reply other threads:[~2017-12-14 2:32 UTC|newest] Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-12-14 2:30 [PATCH V2 0/6] blk-mq: fix race related with device deletion/reset/switching sched Ming Lei 2017-12-14 2:30 ` Ming Lei 2017-12-14 2:30 ` [PATCH V2 1/6] blk-mq: quiesce queue before freeing queue Ming Lei 2017-12-14 2:30 ` Ming Lei 2017-12-29 9:54 ` Christoph Hellwig 2017-12-29 9:54 ` Christoph Hellwig 2017-12-14 2:30 ` [PATCH V2 2/6] blk-mq: support concurrent blk_mq_quiesce_queue() Ming Lei 2017-12-14 2:30 ` Ming Lei 2017-12-29 9:58 ` Christoph Hellwig 2017-12-29 9:58 ` Christoph Hellwig 2018-01-02 3:01 ` Ming Lei 2018-01-02 3:01 ` Ming Lei 2017-12-14 2:31 ` [PATCH V2 3/6] blk-mq: quiesce queue during switching io sched and updating nr_requests Ming Lei 2017-12-14 2:31 ` Ming Lei 2017-12-29 9:58 ` Christoph Hellwig 2017-12-29 9:58 ` Christoph Hellwig 2017-12-14 2:31 ` Ming Lei [this message] 2017-12-14 2:31 ` [PATCH V2 4/6] blk-mq: avoid to map CPU into stale hw queue Ming Lei 2017-12-29 9:59 ` Christoph Hellwig 2017-12-29 9:59 ` Christoph Hellwig 2017-12-14 2:31 ` [PATCH V2 5/6] blk-mq: fix race between updating nr_hw_queues and switching io sched Ming Lei 2017-12-14 2:31 ` Ming Lei 2017-12-29 9:59 ` Christoph Hellwig 2017-12-29 9:59 ` Christoph Hellwig 2017-12-14 2:31 ` [PATCH V2 6/6] nvme-pci: remove .init_request callback Ming Lei 2017-12-14 2:31 ` Ming Lei 2017-12-21 8:20 ` Sagi Grimberg 2017-12-21 8:20 ` Sagi Grimberg 2017-12-21 8:36 ` Johannes Thumshirn 2017-12-21 8:36 ` Johannes Thumshirn 2017-12-22 1:34 ` Ming Lei 2017-12-22 1:34 ` Ming Lei 2017-12-24 8:50 ` Sagi Grimberg 2017-12-24 8:50 ` Sagi Grimberg 2017-12-19 15:30 ` [PATCH V2 0/6] blk-mq: fix race related with device deletion/reset/switching sched Yi Zhang 2017-12-19 15:30 ` Yi Zhang
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20171214023103.18272-5-ming.lei@redhat.com \ --to=ming.lei@redhat.com \ --cc=axboe@fb.com \ --cc=bart.vanassche@sandisk.com \ --cc=hch@lst.de \ --cc=jthumshirn@suse.de \ --cc=keith.busch@intel.com \ --cc=linux-block@vger.kernel.org \ --cc=linux-nvme@lists.infradead.org \ --cc=sagi@grimberg.me \ --cc=yi.zhang@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.