From: Pavel Begunkov <asml.silence@gmail.com>
To: Hou Tao <houtao1@huawei.com>,
axboe@kernel.dk, linux-block@vger.kernel.org
Cc: osandov@fb.com, ming.lei@redhat.com
Subject: Re: [PATCH 1/2] block: make rq sector size accessible for block stats
Date: Tue, 28 May 2019 21:25:12 +0300 [thread overview]
Message-ID: <4ddc4092-e760-4c67-4dc6-33660c99b808@gmail.com> (raw)
In-Reply-To: <1a525ec8-3197-7cb4-bec7-bdd8b345cc84@huawei.com>
[-- Attachment #1.1: Type: text/plain, Size: 6239 bytes --]
On 27/05/2019 14:39, Hou Tao wrote:
> Hi,
>
> On 2019/5/25 19:46, Pavel Begunkov wrote:
>> That's the same bug I posted a patch about 3+ weeks ago. No answer yet,
>> however.
>> https://www.spinics.net/lists/linux-block/msg40049.html
>>
> Sorry for missing that.
Anyway I haven't fixed it properly there.
>
>> I think putting time sampling at __blk_mq_end_request() won't give
>> sufficient precision for more sophisticated hybrid polling, because path
>> __blk_mq_complete_request() -> __blk_mq_end_request() adds a lot of
>> overhead (redirect rq to another cpu, dma unmap, etc). That's the case
>> for mentioned patchset
>> https://www.spinics.net/lists/linux-block/msg40044.html
>>
>> It works OK for 50-80us requests (e.g. SSD read), but is not good enough
>> for 20us and less (e.g. SSD write). I think it would be better to save
>> the time in advance, and use it later.
>>
> Yes, getting the end time from blk_mq_complete_request() will be more accurate.
What bothers me, if we sample time there, could we use it in throttling?
I'm not sure what it exactly expects.
>
>> By the way, it seems, that __blk_mq_end_request() is really better place
>> to record stats, as not all drivers use __blk_mq_complete_request(),
>> even though they don't implement blk_poll().
>>
> And __blk_mq_complete_request() may be invoked multiple times (although no possible
> for nvme device), but __blk_mq_end_request() is only invoked once.
Doubt it, that would be rather strange to *complete* a request multiple
times. Need to check, but so far I only seen drivers either using it
once or calling blk_mq_end_request() directly.
>
> Regards,
> Tao
>
>>
>> On 21/05/2019 10:59, Hou Tao wrote:
>>> Currently rq->data_len will be decreased by partial completion or
>>> zeroed by completion, so when blk_stat_add() is invoked, data_len
>>> will be zero and there will never be samples in poll_cb because
>>> blk_mq_poll_stats_bkt() will return -1 if data_len is zero.
>>>
>>> We could move blk_stat_add() back to __blk_mq_complete_request(),
>>> but that would make the effort of trying to call ktime_get_ns()
>>> once in vain. Instead we can reuse throtl_size field, and use
>>> it for both block stats and block throttle, and adjust the
>>> logic in blk_mq_poll_stats_bkt() accordingly.
>>>
>>> Fixes: 4bc6339a583c ("block: move blk_stat_add() to __blk_mq_end_request()")
>>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>>> ---
>>> block/blk-mq.c | 11 +++++------
>>> block/blk-throttle.c | 3 ++-
>>> include/linux/blkdev.h | 15 ++++++++++++---
>>> 3 files changed, 19 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 08a6248d8536..4d1462172f0f 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -44,12 +44,12 @@ static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
>>>
>>> static int blk_mq_poll_stats_bkt(const struct request *rq)
>>> {
>>> - int ddir, bytes, bucket;
>>> + int ddir, sectors, bucket;
>>>
>>> ddir = rq_data_dir(rq);
>>> - bytes = blk_rq_bytes(rq);
>>> + sectors = blk_rq_stats_sectors(rq);
>>>
>>> - bucket = ddir + 2*(ilog2(bytes) - 9);
>>> + bucket = ddir + 2 * ilog2(sectors);
>>>
>>> if (bucket < 0)
>>> return -1;
>>> @@ -329,6 +329,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
>>> else
>>> rq->start_time_ns = 0;
>>> rq->io_start_time_ns = 0;
>>> + rq->stats_sectors = 0;
>>> rq->nr_phys_segments = 0;
>>> #if defined(CONFIG_BLK_DEV_INTEGRITY)
>>> rq->nr_integrity_segments = 0;
>>> @@ -678,9 +679,7 @@ void blk_mq_start_request(struct request *rq)
>>>
>>> if (test_bit(QUEUE_FLAG_STATS, &q->queue_flags)) {
>>> rq->io_start_time_ns = ktime_get_ns();
>>> -#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
>>> - rq->throtl_size = blk_rq_sectors(rq);
>>> -#endif
>>> + rq->stats_sectors = blk_rq_sectors(rq);
>>> rq->rq_flags |= RQF_STATS;
>>> rq_qos_issue(q, rq);
>>> }
>>> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
>>> index 1b97a73d2fb1..88459a4ac704 100644
>>> --- a/block/blk-throttle.c
>>> +++ b/block/blk-throttle.c
>>> @@ -2249,7 +2249,8 @@ void blk_throtl_stat_add(struct request *rq, u64 time_ns)
>>> struct request_queue *q = rq->q;
>>> struct throtl_data *td = q->td;
>>>
>>> - throtl_track_latency(td, rq->throtl_size, req_op(rq), time_ns >> 10);
>>> + throtl_track_latency(td, blk_rq_stats_sectors(rq), req_op(rq),
>>> + time_ns >> 10);
>>> }
>>>
>>> void blk_throtl_bio_endio(struct bio *bio)
>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>> index 1aafeb923e7b..68a0841d3554 100644
>>> --- a/include/linux/blkdev.h
>>> +++ b/include/linux/blkdev.h
>>> @@ -202,9 +202,12 @@ struct request {
>>> #ifdef CONFIG_BLK_WBT
>>> unsigned short wbt_flags;
>>> #endif
>>> -#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
>>> - unsigned short throtl_size;
>>> -#endif
>>> + /*
>>> + * rq sectors used for blk stats. It has the same value
>>> + * with blk_rq_sectors(rq), except that it never be zeroed
>>> + * by completion.
>>> + */
>>> + unsigned short stats_sectors;
>>>
>>> /*
>>> * Number of scatter-gather DMA addr+len pairs after
>>> @@ -892,6 +895,7 @@ static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
>>> * blk_rq_err_bytes() : bytes left till the next error boundary
>>> * blk_rq_sectors() : sectors left in the entire request
>>> * blk_rq_cur_sectors() : sectors left in the current segment
>>> + * blk_rq_stats_sectors() : sectors of the entire request used for stats
>>> */
>>> static inline sector_t blk_rq_pos(const struct request *rq)
>>> {
>>> @@ -920,6 +924,11 @@ static inline unsigned int blk_rq_cur_sectors(const struct request *rq)
>>> return blk_rq_cur_bytes(rq) >> SECTOR_SHIFT;
>>> }
>>>
>>> +static inline unsigned int blk_rq_stats_sectors(const struct request *rq)
>>> +{
>>> + return rq->stats_sectors;
>>> +}
>>> +
>>> #ifdef CONFIG_BLK_DEV_ZONED
>>> static inline unsigned int blk_rq_zone_no(struct request *rq)
>>> {
>>>
>>
>
--
Yours sincerely,
Pavel Begunkov
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2019-05-28 18:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-21 7:59 [PATCH 0/2] fixes for block stats Hou Tao
2019-05-21 7:59 ` [PATCH 1/2] block: make rq sector size accessible " Hou Tao
2019-05-25 11:46 ` Pavel Begunkov
2019-05-25 13:52 ` Pavel Begunkov
2019-05-27 11:39 ` Hou Tao
2019-05-28 18:25 ` Pavel Begunkov [this message]
2019-09-15 16:52 ` Pavel Begunkov
2019-05-21 7:59 ` [PATCH 2/2] block: also check RQF_STATS in blk_mq_need_time_stamp() Hou Tao
2019-05-25 5:12 ` [PATCH 0/2] fixes for block stats Hou Tao
2019-09-15 22:02 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ddc4092-e760-4c67-4dc6-33660c99b808@gmail.com \
--to=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=houtao1@huawei.com \
--cc=linux-block@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=osandov@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).