From: Jens Axboe <axboe@kernel.dk> To: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com>, Christoph Hellwig <hch@lst.de>, linux-nvme@lists.infradead.org, Brian King <brking@linux.vnet.ibm.com>, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org Subject: Re: Oops when completing request on the wrong queue Date: Wed, 24 Aug 2016 14:36:38 -0600 [thread overview] Message-ID: <dbe42007-8109-2e21-d0f3-0778007cd152@kernel.dk> (raw) In-Reply-To: <49a954e6-2f96-8a63-ce15-2c82c1a1d36d@kernel.dk> On 08/24/2016 12:34 PM, Jens Axboe wrote: > On 08/23/2016 03:14 PM, Jens Axboe wrote: >> On 08/23/2016 03:11 PM, Jens Axboe wrote: >>> On 08/23/2016 02:54 PM, Gabriel Krisman Bertazi wrote: >>>> Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> writes: >>>> >>>>>> Can you share what you ran to online/offline CPUs? I can't reproduce >>>>>> this here. >>>>> >>>>> I was using the ppc64_cpu tool, which shouldn't do nothing more than >>>>> write to sysfs. but I just reproduced it with the script below. >>>>> >>>>> Note that this is ppc64le. I don't have a x86 in hand to attempt to >>>>> reproduce right now, but I'll look for one and see how it goes. >>>> >>>> Hi, >>>> >>>> Any luck on reproducing it? We were initially reproducing with a >>>> proprietary stress test, but I gave a try to a generated fio jobfile >>>> associated with the SMT script I shared earlier and I could reproduce >>>> the crash consistently in less than 10 minutes of execution. this was >>>> still ppc64le, though. I couldn't get my hands on nvme on x86 yet. >>> >>> Nope, I have not been able to reproduce it. How long does the CPU >>> offline/online actions take on ppc64? It's pretty slow on x86, which may >>> hide the issue. I took out the various printk's associated with bringing >>> a CPU off/online, as well as IRQ breaking parts, but didn't help in >>> reproducing it. >>> >>>> The job file I used, as well as the smt.sh script, in case you want to >>>> give it a try: >>>> >>>> jobfile: http://krisman.be/k/nvmejob.fio >>>> smt.sh: http://krisman.be/k/smt.sh >>>> >>>> Still, the trigger seems to be consistently a heavy load of IO >>>> associated with CPU addition/removal. >>> >>> My workload looks similar to yours, in that it's high depth and with a >>> lot of jobs to keep most CPUs loaded. My bash script is different than >>> yours, I'll try that and see if it helps here. >> >> Actually, I take that back. You're not using O_DIRECT, hence all your >> jobs are running at QD=1, not the 256 specified. That looks odd, but >> I'll try, maybe it'll hit something different. > > Can you try this patch? It's not perfect, but I'll be interested if it > makes a difference for you. This one should handle the WARN_ON() for running the hw queue on the wrong CPU as well. diff --git a/block/blk-mq.c b/block/blk-mq.c index 758a9b5..b21a9b9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -810,11 +810,12 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) struct list_head *dptr; int queued; - WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)); - if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state))) return; + WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && + cpu_online(hctx->next_cpu)); + hctx->run++; /* @@ -1075,15 +1082,11 @@ static void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, } void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue, - bool async) + bool async) { + struct blk_mq_ctx *ctx = rq->mq_ctx; struct request_queue *q = rq->q; struct blk_mq_hw_ctx *hctx; - struct blk_mq_ctx *ctx = rq->mq_ctx, *current_ctx; - - current_ctx = blk_mq_get_ctx(q); - if (!cpu_online(ctx->cpu)) - rq->mq_ctx = ctx = current_ctx; hctx = q->mq_ops->map_queue(q, ctx->cpu); @@ -1093,8 +1096,6 @@ void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue, if (run_queue) blk_mq_run_hw_queue(hctx, async); - - blk_mq_put_ctx(current_ctx); } static void blk_mq_insert_requests(struct request_queue *q, @@ -1105,14 +1106,9 @@ static void blk_mq_insert_requests(struct request_queue *q, { struct blk_mq_hw_ctx *hctx; - struct blk_mq_ctx *current_ctx; trace_block_unplug(q, depth, !from_schedule); - current_ctx = blk_mq_get_ctx(q); - - if (!cpu_online(ctx->cpu)) - ctx = current_ctx; hctx = q->mq_ops->map_queue(q, ctx->cpu); /* @@ -1125,14 +1121,12 @@ static void blk_mq_insert_requests(struct request_queue *q, rq = list_first_entry(list, struct request, queuelist); list_del_init(&rq->queuelist); - rq->mq_ctx = ctx; __blk_mq_insert_req_list(hctx, ctx, rq, false); } blk_mq_hctx_mark_pending(hctx, ctx); spin_unlock(&ctx->lock); blk_mq_run_hw_queue(hctx, from_schedule); - blk_mq_put_ctx(current_ctx); } static int plug_ctx_cmp(void *priv, struct list_head *a, struct list_head *b) @@ -1692,6 +1686,11 @@ static int blk_mq_hctx_cpu_offline(struct blk_mq_hw_ctx *hctx, int cpu) while (!list_empty(&tmp)) { struct request *rq; + /* + * FIXME: we can't just move the req here. We'd have to + * pull off the bio chain and add it to a new request + * on the target hw queue + */ rq = list_first_entry(&tmp, struct request, queuelist); rq->mq_ctx = ctx; list_move_tail(&rq->queuelist, &ctx->rq_list); -- Jens Axboe
WARNING: multiple messages have this Message-ID (diff)
From: axboe@kernel.dk (Jens Axboe) Subject: Oops when completing request on the wrong queue Date: Wed, 24 Aug 2016 14:36:38 -0600 [thread overview] Message-ID: <dbe42007-8109-2e21-d0f3-0778007cd152@kernel.dk> (raw) In-Reply-To: <49a954e6-2f96-8a63-ce15-2c82c1a1d36d@kernel.dk> On 08/24/2016 12:34 PM, Jens Axboe wrote: > On 08/23/2016 03:14 PM, Jens Axboe wrote: >> On 08/23/2016 03:11 PM, Jens Axboe wrote: >>> On 08/23/2016 02:54 PM, Gabriel Krisman Bertazi wrote: >>>> Gabriel Krisman Bertazi <krisman at linux.vnet.ibm.com> writes: >>>> >>>>>> Can you share what you ran to online/offline CPUs? I can't reproduce >>>>>> this here. >>>>> >>>>> I was using the ppc64_cpu tool, which shouldn't do nothing more than >>>>> write to sysfs. but I just reproduced it with the script below. >>>>> >>>>> Note that this is ppc64le. I don't have a x86 in hand to attempt to >>>>> reproduce right now, but I'll look for one and see how it goes. >>>> >>>> Hi, >>>> >>>> Any luck on reproducing it? We were initially reproducing with a >>>> proprietary stress test, but I gave a try to a generated fio jobfile >>>> associated with the SMT script I shared earlier and I could reproduce >>>> the crash consistently in less than 10 minutes of execution. this was >>>> still ppc64le, though. I couldn't get my hands on nvme on x86 yet. >>> >>> Nope, I have not been able to reproduce it. How long does the CPU >>> offline/online actions take on ppc64? It's pretty slow on x86, which may >>> hide the issue. I took out the various printk's associated with bringing >>> a CPU off/online, as well as IRQ breaking parts, but didn't help in >>> reproducing it. >>> >>>> The job file I used, as well as the smt.sh script, in case you want to >>>> give it a try: >>>> >>>> jobfile: http://krisman.be/k/nvmejob.fio >>>> smt.sh: http://krisman.be/k/smt.sh >>>> >>>> Still, the trigger seems to be consistently a heavy load of IO >>>> associated with CPU addition/removal. >>> >>> My workload looks similar to yours, in that it's high depth and with a >>> lot of jobs to keep most CPUs loaded. My bash script is different than >>> yours, I'll try that and see if it helps here. >> >> Actually, I take that back. You're not using O_DIRECT, hence all your >> jobs are running at QD=1, not the 256 specified. That looks odd, but >> I'll try, maybe it'll hit something different. > > Can you try this patch? It's not perfect, but I'll be interested if it > makes a difference for you. This one should handle the WARN_ON() for running the hw queue on the wrong CPU as well. diff --git a/block/blk-mq.c b/block/blk-mq.c index 758a9b5..b21a9b9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -810,11 +810,12 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) struct list_head *dptr; int queued; - WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)); - if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state))) return; + WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && + cpu_online(hctx->next_cpu)); + hctx->run++; /* @@ -1075,15 +1082,11 @@ static void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, } void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue, - bool async) + bool async) { + struct blk_mq_ctx *ctx = rq->mq_ctx; struct request_queue *q = rq->q; struct blk_mq_hw_ctx *hctx; - struct blk_mq_ctx *ctx = rq->mq_ctx, *current_ctx; - - current_ctx = blk_mq_get_ctx(q); - if (!cpu_online(ctx->cpu)) - rq->mq_ctx = ctx = current_ctx; hctx = q->mq_ops->map_queue(q, ctx->cpu); @@ -1093,8 +1096,6 @@ void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue, if (run_queue) blk_mq_run_hw_queue(hctx, async); - - blk_mq_put_ctx(current_ctx); } static void blk_mq_insert_requests(struct request_queue *q, @@ -1105,14 +1106,9 @@ static void blk_mq_insert_requests(struct request_queue *q, { struct blk_mq_hw_ctx *hctx; - struct blk_mq_ctx *current_ctx; trace_block_unplug(q, depth, !from_schedule); - current_ctx = blk_mq_get_ctx(q); - - if (!cpu_online(ctx->cpu)) - ctx = current_ctx; hctx = q->mq_ops->map_queue(q, ctx->cpu); /* @@ -1125,14 +1121,12 @@ static void blk_mq_insert_requests(struct request_queue *q, rq = list_first_entry(list, struct request, queuelist); list_del_init(&rq->queuelist); - rq->mq_ctx = ctx; __blk_mq_insert_req_list(hctx, ctx, rq, false); } blk_mq_hctx_mark_pending(hctx, ctx); spin_unlock(&ctx->lock); blk_mq_run_hw_queue(hctx, from_schedule); - blk_mq_put_ctx(current_ctx); } static int plug_ctx_cmp(void *priv, struct list_head *a, struct list_head *b) @@ -1692,6 +1686,11 @@ static int blk_mq_hctx_cpu_offline(struct blk_mq_hw_ctx *hctx, int cpu) while (!list_empty(&tmp)) { struct request *rq; + /* + * FIXME: we can't just move the req here. We'd have to + * pull off the bio chain and add it to a new request + * on the target hw queue + */ rq = list_first_entry(&tmp, struct request, queuelist); rq->mq_ctx = ctx; list_move_tail(&rq->queuelist, &ctx->rq_list); -- Jens Axboe
next prev parent reply other threads:[~2016-08-24 20:36 UTC|newest] Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-08-10 4:04 Oops when completing request on the wrong queue Gabriel Krisman Bertazi 2016-08-11 17:16 ` Keith Busch 2016-08-11 18:10 ` Gabriel Krisman Bertazi 2016-08-19 13:28 ` Gabriel Krisman Bertazi 2016-08-19 13:28 ` Gabriel Krisman Bertazi 2016-08-19 14:13 ` Jens Axboe 2016-08-19 14:13 ` Jens Axboe 2016-08-19 15:51 ` Jens Axboe 2016-08-19 15:51 ` Jens Axboe 2016-08-19 16:38 ` Gabriel Krisman Bertazi 2016-08-19 16:38 ` Gabriel Krisman Bertazi 2016-08-23 20:54 ` Gabriel Krisman Bertazi 2016-08-23 20:54 ` Gabriel Krisman Bertazi 2016-08-23 21:11 ` Jens Axboe 2016-08-23 21:11 ` Jens Axboe 2016-08-23 21:14 ` Jens Axboe 2016-08-23 21:14 ` Jens Axboe 2016-08-23 22:49 ` Keith Busch 2016-08-23 22:49 ` Keith Busch 2016-08-24 18:34 ` Jens Axboe 2016-08-24 18:34 ` Jens Axboe 2016-08-24 20:36 ` Jens Axboe [this message] 2016-08-24 20:36 ` Jens Axboe 2016-08-29 18:06 ` Gabriel Krisman Bertazi 2016-08-29 18:06 ` Gabriel Krisman Bertazi 2016-08-29 18:40 ` Jens Axboe 2016-08-29 18:40 ` Jens Axboe 2016-09-05 12:02 ` Gabriel Krisman Bertazi 2016-09-05 12:02 ` Gabriel Krisman Bertazi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=dbe42007-8109-2e21-d0f3-0778007cd152@kernel.dk \ --to=axboe@kernel.dk \ --cc=brking@linux.vnet.ibm.com \ --cc=hch@lst.de \ --cc=keith.busch@intel.com \ --cc=krisman@linux.vnet.ibm.com \ --cc=linux-block@vger.kernel.org \ --cc=linux-nvme@lists.infradead.org \ --cc=linux-scsi@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.