From: Ming Lei <ming.lei@redhat.com> To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@fb.com>, linux-block@vger.kernel.org Cc: Bart Van Assche <bart.vanassche@sandisk.com>, Keith Busch <keith.busch@intel.com>, Sagi Grimberg <sagi@grimberg.me>, Yi Zhang <yi.zhang@redhat.com>, Johannes Thumshirn <jthumshirn@suse.de>, Ming Lei <ming.lei@redhat.com> Subject: [PATCH V2 6/6] nvme-pci: remove .init_request callback Date: Thu, 14 Dec 2017 10:31:03 +0800 [thread overview] Message-ID: <20171214023103.18272-7-ming.lei@redhat.com> (raw) In-Reply-To: <20171214023103.18272-1-ming.lei@redhat.com> It may cause race by setting 'nvmeq' in nvme_init_request() because .init_request is called inside switching io scheduler, which may happen when the NVMe device is being resetted and its nvme queues are being freed and created. We don't have any sync between the two pathes. This patch removes the .init_request callback and sets the nvmeq runtime via hctx->driver_data, which is reliable because .init_hctx is always called after hw queue is resized in reset handler, then the following bug is fixed: [ 93.268391] kernel BUG at drivers/nvme/host/pci.c:408! [ 93.274146] invalid opcode: 0000 [#1] SMP [ 93.278618] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc ipmi_ssif vfat fat intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt intel_cstate ipmi_si iTCO_vendor_support intel_uncore mxm_wmi mei_me ipmi_devintf intel_rapl_perf pcspkr sg ipmi_msghandler lpc_ich dcdbas mei shpchp acpi_power_meter wmi dm_multipath ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci nvme libata crc32c_intel nvme_core tg3 megaraid_sas ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod [ 93.349071] CPU: 5 PID: 1842 Comm: sh Not tainted 4.15.0-rc2.ming+ #4 [ 93.356256] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017 [ 93.364801] task: 00000000fb8abf2a task.stack: 0000000028bd82d1 [ 93.371408] RIP: 0010:nvme_init_request+0x36/0x40 [nvme] [ 93.377333] RSP: 0018:ffffc90002537ca8 EFLAGS: 00010246 [ 93.383161] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000008 [ 93.391122] RDX: 0000000000000000 RSI: ffff880276ae0000 RDI: ffff88047bae9008 [ 93.399084] RBP: ffff88047bae9008 R08: ffff88047bae9008 R09: 0000000009dabc00 [ 93.407045] R10: 0000000000000004 R11: 000000000000299c R12: ffff880186bc1f00 [ 93.415007] R13: ffff880276ae0000 R14: 0000000000000000 R15: 0000000000000071 [ 93.422969] FS: 00007f33cf288740(0000) GS:ffff88047ba80000(0000) knlGS:0000000000000000 [ 93.431996] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 93.438407] CR2: 00007f33cf28e000 CR3: 000000047e5bb006 CR4: 00000000001606e0 [ 93.446368] Call Trace: [ 93.449103] blk_mq_alloc_rqs+0x231/0x2a0 [ 93.453579] blk_mq_sched_alloc_tags.isra.8+0x42/0x80 [ 93.459214] blk_mq_init_sched+0x7e/0x140 [ 93.463687] elevator_switch+0x5a/0x1f0 [ 93.467966] ? elevator_get.isra.17+0x52/0xc0 [ 93.472826] elv_iosched_store+0xde/0x150 [ 93.477299] queue_attr_store+0x4e/0x90 [ 93.481580] kernfs_fop_write+0xfa/0x180 [ 93.485958] __vfs_write+0x33/0x170 [ 93.489851] ? __inode_security_revalidate+0x4c/0x60 [ 93.495390] ? selinux_file_permission+0xda/0x130 [ 93.500641] ? _cond_resched+0x15/0x30 [ 93.504815] vfs_write+0xad/0x1a0 [ 93.508512] SyS_write+0x52/0xc0 [ 93.512113] do_syscall_64+0x61/0x1a0 [ 93.516199] entry_SYSCALL64_slow_path+0x25/0x25 [ 93.521351] RIP: 0033:0x7f33ce96aab0 [ 93.525337] RSP: 002b:00007ffe57570238 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 93.533785] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f33ce96aab0 [ 93.541746] RDX: 0000000000000006 RSI: 00007f33cf28e000 RDI: 0000000000000001 [ 93.549707] RBP: 00007f33cf28e000 R08: 000000000000000a R09: 00007f33cf288740 [ 93.557669] R10: 00007f33cf288740 R11: 0000000000000246 R12: 00007f33cec42400 [ 93.565630] R13: 0000000000000006 R14: 0000000000000001 R15: 0000000000000000 [ 93.573592] Code: 4c 8d 40 08 4c 39 c7 74 16 48 8b 00 48 8b 04 08 48 85 c0 74 16 48 89 86 78 01 00 00 31 c0 c3 8d 4a 01 48 63 c9 48 c1 e1 03 eb de <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 f6 53 48 89 [ 93.594676] RIP: nvme_init_request+0x36/0x40 [nvme] RSP: ffffc90002537ca8 [ 93.602273] ---[ end trace 810dde3993e5f14e ]--- Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/nvme/host/pci.c | 21 ++++----------------- 1 file changed, 4 insertions(+), 17 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index f5800c3c9082..9eff33b21ca5 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -397,19 +397,6 @@ static int nvme_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, return 0; } -static int nvme_init_request(struct blk_mq_tag_set *set, struct request *req, - unsigned int hctx_idx, unsigned int numa_node) -{ - struct nvme_dev *dev = set->driver_data; - struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - int queue_idx = (set == &dev->tagset) ? hctx_idx + 1 : 0; - struct nvme_queue *nvmeq = dev->queues[queue_idx]; - - BUG_ON(!nvmeq); - iod->nvmeq = nvmeq; - return 0; -} - static int nvme_pci_map_queues(struct blk_mq_tag_set *set) { struct nvme_dev *dev = set->driver_data; @@ -448,7 +435,8 @@ static void **nvme_pci_iod_list(struct request *req) return (void **)(iod->sg + blk_rq_nr_phys_segments(req)); } -static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev) +static blk_status_t nvme_init_iod(struct request *rq, + struct nvme_dev *dev, struct nvme_queue *nvmeq) { struct nvme_iod *iod = blk_mq_rq_to_pdu(rq); int nseg = blk_rq_nr_phys_segments(rq); @@ -469,6 +457,7 @@ static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev) iod->npages = -1; iod->nents = 0; iod->length = size; + iod->nvmeq = nvmeq; return BLK_STS_OK; } @@ -876,7 +865,7 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx, if (ret) return ret; - ret = nvme_init_iod(req, dev); + ret = nvme_init_iod(req, dev, nvmeq); if (ret) goto out_free_cmd; @@ -1484,7 +1473,6 @@ static const struct blk_mq_ops nvme_mq_admin_ops = { .complete = nvme_pci_complete_rq, .init_hctx = nvme_admin_init_hctx, .exit_hctx = nvme_admin_exit_hctx, - .init_request = nvme_init_request, .timeout = nvme_timeout, }; @@ -1492,7 +1480,6 @@ static const struct blk_mq_ops nvme_mq_ops = { .queue_rq = nvme_queue_rq, .complete = nvme_pci_complete_rq, .init_hctx = nvme_init_hctx, - .init_request = nvme_init_request, .map_queues = nvme_pci_map_queues, .timeout = nvme_timeout, .poll = nvme_poll, -- 2.9.5
WARNING: multiple messages have this Message-ID (diff)
From: ming.lei@redhat.com (Ming Lei) Subject: [PATCH V2 6/6] nvme-pci: remove .init_request callback Date: Thu, 14 Dec 2017 10:31:03 +0800 [thread overview] Message-ID: <20171214023103.18272-7-ming.lei@redhat.com> (raw) In-Reply-To: <20171214023103.18272-1-ming.lei@redhat.com> It may cause race by setting 'nvmeq' in nvme_init_request() because .init_request is called inside switching io scheduler, which may happen when the NVMe device is being resetted and its nvme queues are being freed and created. We don't have any sync between the two pathes. This patch removes the .init_request callback and sets the nvmeq runtime via hctx->driver_data, which is reliable because .init_hctx is always called after hw queue is resized in reset handler, then the following bug is fixed: [ 93.268391] kernel BUG at drivers/nvme/host/pci.c:408! [ 93.274146] invalid opcode: 0000 [#1] SMP [ 93.278618] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc ipmi_ssif vfat fat intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt intel_cstate ipmi_si iTCO_vendor_support intel_uncore mxm_wmi mei_me ipmi_devintf intel_rapl_perf pcspkr sg ipmi_msghandler lpc_ich dcdbas mei shpchp acpi_power_meter wmi dm_multipath ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci nvme libata crc32c_intel nvme_core tg3 megaraid_sas ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod [ 93.349071] CPU: 5 PID: 1842 Comm: sh Not tainted 4.15.0-rc2.ming+ #4 [ 93.356256] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017 [ 93.364801] task: 00000000fb8abf2a task.stack: 0000000028bd82d1 [ 93.371408] RIP: 0010:nvme_init_request+0x36/0x40 [nvme] [ 93.377333] RSP: 0018:ffffc90002537ca8 EFLAGS: 00010246 [ 93.383161] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000008 [ 93.391122] RDX: 0000000000000000 RSI: ffff880276ae0000 RDI: ffff88047bae9008 [ 93.399084] RBP: ffff88047bae9008 R08: ffff88047bae9008 R09: 0000000009dabc00 [ 93.407045] R10: 0000000000000004 R11: 000000000000299c R12: ffff880186bc1f00 [ 93.415007] R13: ffff880276ae0000 R14: 0000000000000000 R15: 0000000000000071 [ 93.422969] FS: 00007f33cf288740(0000) GS:ffff88047ba80000(0000) knlGS:0000000000000000 [ 93.431996] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 93.438407] CR2: 00007f33cf28e000 CR3: 000000047e5bb006 CR4: 00000000001606e0 [ 93.446368] Call Trace: [ 93.449103] blk_mq_alloc_rqs+0x231/0x2a0 [ 93.453579] blk_mq_sched_alloc_tags.isra.8+0x42/0x80 [ 93.459214] blk_mq_init_sched+0x7e/0x140 [ 93.463687] elevator_switch+0x5a/0x1f0 [ 93.467966] ? elevator_get.isra.17+0x52/0xc0 [ 93.472826] elv_iosched_store+0xde/0x150 [ 93.477299] queue_attr_store+0x4e/0x90 [ 93.481580] kernfs_fop_write+0xfa/0x180 [ 93.485958] __vfs_write+0x33/0x170 [ 93.489851] ? __inode_security_revalidate+0x4c/0x60 [ 93.495390] ? selinux_file_permission+0xda/0x130 [ 93.500641] ? _cond_resched+0x15/0x30 [ 93.504815] vfs_write+0xad/0x1a0 [ 93.508512] SyS_write+0x52/0xc0 [ 93.512113] do_syscall_64+0x61/0x1a0 [ 93.516199] entry_SYSCALL64_slow_path+0x25/0x25 [ 93.521351] RIP: 0033:0x7f33ce96aab0 [ 93.525337] RSP: 002b:00007ffe57570238 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 93.533785] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f33ce96aab0 [ 93.541746] RDX: 0000000000000006 RSI: 00007f33cf28e000 RDI: 0000000000000001 [ 93.549707] RBP: 00007f33cf28e000 R08: 000000000000000a R09: 00007f33cf288740 [ 93.557669] R10: 00007f33cf288740 R11: 0000000000000246 R12: 00007f33cec42400 [ 93.565630] R13: 0000000000000006 R14: 0000000000000001 R15: 0000000000000000 [ 93.573592] Code: 4c 8d 40 08 4c 39 c7 74 16 48 8b 00 48 8b 04 08 48 85 c0 74 16 48 89 86 78 01 00 00 31 c0 c3 8d 4a 01 48 63 c9 48 c1 e1 03 eb de <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 f6 53 48 89 [ 93.594676] RIP: nvme_init_request+0x36/0x40 [nvme] RSP: ffffc90002537ca8 [ 93.602273] ---[ end trace 810dde3993e5f14e ]--- Reported-by: Yi Zhang <yi.zhang at redhat.com> Signed-off-by: Ming Lei <ming.lei at redhat.com> --- drivers/nvme/host/pci.c | 21 ++++----------------- 1 file changed, 4 insertions(+), 17 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index f5800c3c9082..9eff33b21ca5 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -397,19 +397,6 @@ static int nvme_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, return 0; } -static int nvme_init_request(struct blk_mq_tag_set *set, struct request *req, - unsigned int hctx_idx, unsigned int numa_node) -{ - struct nvme_dev *dev = set->driver_data; - struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - int queue_idx = (set == &dev->tagset) ? hctx_idx + 1 : 0; - struct nvme_queue *nvmeq = dev->queues[queue_idx]; - - BUG_ON(!nvmeq); - iod->nvmeq = nvmeq; - return 0; -} - static int nvme_pci_map_queues(struct blk_mq_tag_set *set) { struct nvme_dev *dev = set->driver_data; @@ -448,7 +435,8 @@ static void **nvme_pci_iod_list(struct request *req) return (void **)(iod->sg + blk_rq_nr_phys_segments(req)); } -static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev) +static blk_status_t nvme_init_iod(struct request *rq, + struct nvme_dev *dev, struct nvme_queue *nvmeq) { struct nvme_iod *iod = blk_mq_rq_to_pdu(rq); int nseg = blk_rq_nr_phys_segments(rq); @@ -469,6 +457,7 @@ static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev) iod->npages = -1; iod->nents = 0; iod->length = size; + iod->nvmeq = nvmeq; return BLK_STS_OK; } @@ -876,7 +865,7 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx, if (ret) return ret; - ret = nvme_init_iod(req, dev); + ret = nvme_init_iod(req, dev, nvmeq); if (ret) goto out_free_cmd; @@ -1484,7 +1473,6 @@ static const struct blk_mq_ops nvme_mq_admin_ops = { .complete = nvme_pci_complete_rq, .init_hctx = nvme_admin_init_hctx, .exit_hctx = nvme_admin_exit_hctx, - .init_request = nvme_init_request, .timeout = nvme_timeout, }; @@ -1492,7 +1480,6 @@ static const struct blk_mq_ops nvme_mq_ops = { .queue_rq = nvme_queue_rq, .complete = nvme_pci_complete_rq, .init_hctx = nvme_init_hctx, - .init_request = nvme_init_request, .map_queues = nvme_pci_map_queues, .timeout = nvme_timeout, .poll = nvme_poll, -- 2.9.5
next prev parent reply other threads:[~2017-12-14 2:32 UTC|newest] Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-12-14 2:30 [PATCH V2 0/6] blk-mq: fix race related with device deletion/reset/switching sched Ming Lei 2017-12-14 2:30 ` Ming Lei 2017-12-14 2:30 ` [PATCH V2 1/6] blk-mq: quiesce queue before freeing queue Ming Lei 2017-12-14 2:30 ` Ming Lei 2017-12-29 9:54 ` Christoph Hellwig 2017-12-29 9:54 ` Christoph Hellwig 2017-12-14 2:30 ` [PATCH V2 2/6] blk-mq: support concurrent blk_mq_quiesce_queue() Ming Lei 2017-12-14 2:30 ` Ming Lei 2017-12-29 9:58 ` Christoph Hellwig 2017-12-29 9:58 ` Christoph Hellwig 2018-01-02 3:01 ` Ming Lei 2018-01-02 3:01 ` Ming Lei 2017-12-14 2:31 ` [PATCH V2 3/6] blk-mq: quiesce queue during switching io sched and updating nr_requests Ming Lei 2017-12-14 2:31 ` Ming Lei 2017-12-29 9:58 ` Christoph Hellwig 2017-12-29 9:58 ` Christoph Hellwig 2017-12-14 2:31 ` [PATCH V2 4/6] blk-mq: avoid to map CPU into stale hw queue Ming Lei 2017-12-14 2:31 ` Ming Lei 2017-12-29 9:59 ` Christoph Hellwig 2017-12-29 9:59 ` Christoph Hellwig 2017-12-14 2:31 ` [PATCH V2 5/6] blk-mq: fix race between updating nr_hw_queues and switching io sched Ming Lei 2017-12-14 2:31 ` Ming Lei 2017-12-29 9:59 ` Christoph Hellwig 2017-12-29 9:59 ` Christoph Hellwig 2017-12-14 2:31 ` Ming Lei [this message] 2017-12-14 2:31 ` [PATCH V2 6/6] nvme-pci: remove .init_request callback Ming Lei 2017-12-21 8:20 ` Sagi Grimberg 2017-12-21 8:20 ` Sagi Grimberg 2017-12-21 8:36 ` Johannes Thumshirn 2017-12-21 8:36 ` Johannes Thumshirn 2017-12-22 1:34 ` Ming Lei 2017-12-22 1:34 ` Ming Lei 2017-12-24 8:50 ` Sagi Grimberg 2017-12-24 8:50 ` Sagi Grimberg 2017-12-19 15:30 ` [PATCH V2 0/6] blk-mq: fix race related with device deletion/reset/switching sched Yi Zhang 2017-12-19 15:30 ` Yi Zhang
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20171214023103.18272-7-ming.lei@redhat.com \ --to=ming.lei@redhat.com \ --cc=axboe@fb.com \ --cc=bart.vanassche@sandisk.com \ --cc=hch@lst.de \ --cc=jthumshirn@suse.de \ --cc=keith.busch@intel.com \ --cc=linux-block@vger.kernel.org \ --cc=linux-nvme@lists.infradead.org \ --cc=sagi@grimberg.me \ --cc=yi.zhang@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.