From: Jens Axboe <axboe@fb.com>
To: Bart Van Assche <Bart.VanAssche@sandisk.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"osandov@osandov.com" <osandov@osandov.com>
Subject: Re: [PATCH 08/10] blk-mq-sched: add framework for MQ capable IO schedulers
Date: Thu, 12 Jan 2017 14:59:22 -0700 [thread overview]
Message-ID: <20170112215922.GB25197@kernel.dk> (raw)
In-Reply-To: <1484257502.2720.21.camel@sandisk.com>
On Thu, Jan 12 2017, Bart Van Assche wrote:
> On Wed, 2017-01-11 at 14:40 -0700, Jens Axboe wrote:
> > @@ -451,11 +456,11 @@ void blk_insert_flush(struct request *rq)
> > * processed directly without going through flush machinery. Queue
> > * for normal execution.
> > */
> > - if ((policy & REQ_FSEQ_DATA) &&
> > - !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
> > - if (q->mq_ops) {
> > - blk_mq_insert_request(rq, false, true, false);
> > - } else
> > + if (((policy & REQ_FSEQ_DATA) &&
> > + !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH)))) {
> > + if (q->mq_ops)
> > + blk_mq_sched_insert_request(rq, false, true, false);
> > + else
> > list_add_tail(&rq->queuelist, &q->queue_head);
> > return;
> > }
>
> Not that it really matters, but this change adds a pair of parentheses --
> "if (e)" is changed into "if ((e))". Is this necessary?
I fixed that up earlier today, as I noticed the same. So that's gone in
the current -git tree.
> > +void blk_mq_sched_free_hctx_data(struct request_queue *q,
> > + void (*exit)(struct blk_mq_hw_ctx *))
> > +{
> > + struct blk_mq_hw_ctx *hctx;
> > + int i;
> > +
> > + queue_for_each_hw_ctx(q, hctx, i) {
> > + if (exit)
> > + exit(hctx);
> > + kfree(hctx->sched_data);
> > + hctx->sched_data = NULL;
> > + }
> > +}
> > +EXPORT_SYMBOL_GPL(blk_mq_sched_free_hctx_data);
> > +
> > +int blk_mq_sched_init_hctx_data(struct request_queue *q, size_t size,
> > + int (*init)(struct blk_mq_hw_ctx *),
> > + void (*exit)(struct blk_mq_hw_ctx *))
> > +{
> > + struct blk_mq_hw_ctx *hctx;
> > + int ret;
> > + int i;
> > +
> > + queue_for_each_hw_ctx(q, hctx, i) {
> > + hctx->sched_data = kmalloc_node(size, GFP_KERNEL, hctx->numa_node);
> > + if (!hctx->sched_data) {
> > + ret = -ENOMEM;
> > + goto error;
> > + }
> > +
> > + if (init) {
> > + ret = init(hctx);
> > + if (ret) {
> > + /*
> > + * We don't want to give exit() a partially
> > + * initialized sched_data. init() must clean up
> > + * if it fails.
> > + */
> > + kfree(hctx->sched_data);
> > + hctx->sched_data = NULL;
> > + goto error;
> > + }
> > + }
> > + }
> > +
> > + return 0;
> > +error:
> > + blk_mq_sched_free_hctx_data(q, exit);
> > + return ret;
> > +}
>
> If one of the init() calls by blk_mq_sched_init_hctx_data() fails then
> blk_mq_sched_free_hctx_data() will call exit() even for hctx's for which
> init() has not been called. How about changing "if (exit)" into "if (exit &&
> hctx->sched_data)" such that exit() is only called for hctx's for which
> init() has been called?
Good point, I'll make that change to the exit function.
> > +struct request *blk_mq_sched_get_request(struct request_queue *q,
> > + struct bio *bio,
> > + unsigned int op,
> > + struct blk_mq_alloc_data *data)
> > +{
> > + struct elevator_queue *e = q->elevator;
> > + struct blk_mq_hw_ctx *hctx;
> > + struct blk_mq_ctx *ctx;
> > + struct request *rq;
> > +
> > + blk_queue_enter_live(q);
> > + ctx = blk_mq_get_ctx(q);
> > + hctx = blk_mq_map_queue(q, ctx->cpu);
> > +
> > + blk_mq_set_alloc_data(data, q, 0, ctx, hctx);
> > +
> > + if (e) {
> > + data->flags |= BLK_MQ_REQ_INTERNAL;
> > + if (e->type->ops.mq.get_request)
> > + rq = e->type->ops.mq.get_request(q, op, data);
> > + else
> > + rq = __blk_mq_alloc_request(data, op);
> > + } else {
> > + rq = __blk_mq_alloc_request(data, op);
> > + if (rq) {
> > + rq->tag = rq->internal_tag;
> > + rq->internal_tag = -1;
> > + }
> > + }
> > +
> > + if (rq) {
> > + rq->elv.icq = NULL;
> > + if (e && e->type->icq_cache)
> > + blk_mq_sched_assign_ioc(q, rq, bio);
> > + data->hctx->queued++;
> > + return rq;
> > + }
> > +
> > + blk_queue_exit(q);
> > + return NULL;
> > +}
>
> The "rq->tag = rq->internal_tag; rq->internal_tag = -1;" occurs not only
> here but also in blk_mq_alloc_request_hctx(). Has it been considered to move
> that code into __blk_mq_alloc_request()?
Yes, it's in two locations. I wanted to keep it out of
__blk_mq_alloc_request(), so we can still use that for normal tag
allocations. But maybe it's better for __blk_mq_alloc_request() to just
do:
if (flags & BLK_MQ_REQ_INTERNAL) {
rq->tag = -1;
rq->internal_tag = tag;
} else {
rq->tag = tag;
rq->internal_tag = -1;
}
and handle it directly in there. What do you think?
> @@ -223,14 +225,17 @@ struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data,
> >
> > tag = blk_mq_get_tag(data);
> > if (tag != BLK_MQ_TAG_FAIL) {
> > - rq = data->hctx->tags->rqs[tag];
> > + struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
> > +
> > + rq = tags->rqs[tag];
> >
> > if (blk_mq_tag_busy(data->hctx)) {
> > rq->rq_flags = RQF_MQ_INFLIGHT;
> > atomic_inc(&data->hctx->nr_active);
> > }
> >
> > - rq->tag = tag;
> > + rq->tag = -1;
> > + rq->internal_tag = tag;
> > blk_mq_rq_ctx_init(data->q, data->ctx, rq, op);
> > return rq;
> > }
>
> How about using the following code for tag assignment instead of "rq->tag =
> -1; rq->internal_tag = tag"?
>
> if (data->flags & BLK_MQ_REQ_INTERNAL) {
> rq->tag = -1;
> rq->internal_tag = tag;
> } else {
> rq->tag = tag;
> rq->internal_tag = -1;
> }
Hah, nevermind, I should have read further. I guess we agree, I'll make
that change.
> > @@ -313,6 +313,9 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, int rw,
> > goto out_queue_exit;
> > }
> >
> > + rq->tag = rq->internal_tag;
> > + rq->internal_tag = -1;
> > +
> > return rq;
> >
> > out_queue_exit:
> > @@ -321,10 +324,10 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, int rw,
> > }
> > EXPORT_SYMBOL_GPL(blk_mq_alloc_request_hctx);
>
> Should something like "WARN_ON_ONCE(flags & BLK_MQ_REQ_INTERNAL)" be added
> at the start of this function to avoid that BLK_MQ_REQ_INTERNAL is passed in
> from outside the block layer?
Yes, seems like a prudent safety check. I'll add it, thanks.
--
Jens Axboe
next prev parent reply other threads:[~2017-01-12 22:01 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-11 21:39 [PATCHSET v6] blk-mq scheduling framework Jens Axboe
2017-01-11 21:39 ` [PATCH 01/10] block: move existing elevator ops to union Jens Axboe
2017-01-12 10:15 ` Johannes Thumshirn
2017-01-12 21:17 ` Bart Van Assche
2017-01-13 8:34 ` Christoph Hellwig
2017-01-13 15:00 ` Jens Axboe
2017-01-11 21:39 ` [PATCH 02/10] blk-mq: make mq_ops a const pointer Jens Axboe
2017-01-12 10:14 ` Johannes Thumshirn
2017-01-13 8:16 ` Christoph Hellwig
2017-01-11 21:39 ` [PATCH 03/10] block: move rq_ioc() to blk.h Jens Axboe
2017-01-12 10:14 ` Johannes Thumshirn
2017-01-12 21:18 ` Bart Van Assche
2017-01-13 8:33 ` Christoph Hellwig
2017-01-11 21:39 ` [PATCH 04/10] blk-mq: un-export blk_mq_free_hctx_request() Jens Axboe
2017-01-12 10:13 ` Johannes Thumshirn
2017-01-12 21:18 ` Bart Van Assche
2017-01-13 8:16 ` Christoph Hellwig
2017-01-11 21:39 ` [PATCH 05/10] blk-mq: export some helpers we need to the scheduling framework Jens Axboe
2017-01-12 10:17 ` Johannes Thumshirn
2017-01-12 21:20 ` Bart Van Assche
2017-01-13 8:17 ` Christoph Hellwig
2017-01-13 15:01 ` Jens Axboe
2017-01-11 21:39 ` [PATCH 06/10] blk-mq-tag: cleanup the normal/reserved tag allocation Jens Axboe
2017-01-12 21:22 ` Bart Van Assche
2017-01-12 22:07 ` Jens Axboe
2017-01-13 8:30 ` Christoph Hellwig
2017-01-13 15:06 ` Jens Axboe
2017-01-11 21:40 ` [PATCH 07/10] blk-mq: abstract out helpers for allocating/freeing tag maps Jens Axboe
2017-01-12 21:29 ` Bart Van Assche
2017-01-12 21:54 ` Jens Axboe
2017-01-13 8:25 ` Johannes Thumshirn
2017-01-11 21:40 ` [PATCH 08/10] blk-mq-sched: add framework for MQ capable IO schedulers Jens Axboe
2017-01-12 21:45 ` Bart Van Assche
2017-01-12 21:59 ` Jens Axboe [this message]
2017-01-13 11:15 ` Hannes Reinecke
2017-01-13 16:39 ` Bart Van Assche
2017-01-13 16:41 ` Omar Sandoval
2017-01-13 17:43 ` Hannes Reinecke
2017-01-11 21:40 ` [PATCH 09/10] mq-deadline: add blk-mq adaptation of the deadline IO scheduler Jens Axboe
2017-01-12 21:53 ` Bart Van Assche
2017-01-11 21:40 ` [PATCH 10/10] blk-mq-sched: allow setting of default " Jens Axboe
2017-01-12 21:54 ` Bart Van Assche
2017-01-12 21:16 ` [PATCHSET v6] blk-mq scheduling framework Bart Van Assche
2017-01-13 8:15 ` Hannes Reinecke
2017-01-13 11:04 ` Hannes Reinecke
2017-01-13 12:10 ` Hannes Reinecke
2017-01-13 15:05 ` Jens Axboe
2017-01-13 15:03 ` Jens Axboe
2017-01-13 15:23 ` Jens Axboe
2017-01-13 15:33 ` Hannes Reinecke
2017-01-13 15:34 ` Jens Axboe
2017-01-13 15:59 ` Hannes Reinecke
2017-01-13 16:00 ` Jens Axboe
2017-01-13 16:02 ` Jens Axboe
2017-01-13 21:45 ` Jens Axboe
2017-01-16 8:11 ` Hannes Reinecke
2017-01-16 15:12 ` Jens Axboe
2017-01-16 15:16 ` Jens Axboe
2017-01-16 15:47 ` Jens Axboe
2017-01-13 10:09 ` Hannes Reinecke
2017-01-15 10:12 ` Paolo Valente
2017-01-15 15:55 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170112215922.GB25197@kernel.dk \
--to=axboe@fb.com \
--cc=Bart.VanAssche@sandisk.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=osandov@osandov.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).