linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Ming Lei <tom.leiming@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-block <linux-block@vger.kernel.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 1/4] block: add scalable completion tracking of requests
Date: Sat, 5 Nov 2016 14:49:57 -0600	[thread overview]
Message-ID: <c0444b09-3f50-cb36-52f2-6e3860c9f0b7@kernel.dk> (raw)
In-Reply-To: <CACVXFVNnZoXOjVmfaCk+qhTf6W3UG6feUPsHWHYin2q0ELK4Ww@mail.gmail.com>

On 11/04/2016 05:13 PM, Ming Lei wrote:
>>> Even though it is true, the statistics still may become a mess with rare
>>> collisons.
>>
>>
>> How so? Not saying we could not improve it, but we're trading off
>> precision for scalability. My claim is that the existing code is good
>> enough. I've run a TON of testing on it, since I've used it for multiple
>> projects, and it's been solid.
>
> +static void blk_stat_flush_batch(struct blk_rq_stat *stat)
> +{
> +       if (!stat->nr_batch)
> +               return;
> +       if (!stat->nr_samples)
> +               stat->mean = div64_s64(stat->batch, stat->nr_batch);
>
> For example, two reqs(A & B) are completed at the same time, and A is
> on CPU0, and B is on CPU1.
>
> If the two last writting in the function is reordered observed from
> CPU1, for B, CPU1 runs the above branch when it just sees stat->batch
> is set as zero, but nr_samples isn't updated yet, then div_zero is
> triggered.

We should probably just have the nr_batch be a READ_ONCE(). I'm fine
with the stats being a bit off in the rare case of a collision, but we
can't have a divide-by-zero, obviously.

>
> +       else {
> +               stat->mean = div64_s64((stat->mean * stat->nr_samples) +
> +                                       stat->batch,
> +                                       stat->nr_samples + stat->nr_batch);
> +       }
>
> BTW, the above 'if else' can be removed, and 'stat->mean' can be computed
> in the 2nd way.

True, they could be collapsed.

>> Yes, that might be a good idea, since it doesn't cost us anything. For
>> the mq case, I'm hard pressed to think of areas where we could complete
>> IO in parallel on the same software queue. You'll never have a software
>> queue mapped to multiple hardware queues. So we should essentially be
>> serialized.
>
> For blk-mq, blk_mq_stat_add() is called in __blk_mq_complete_request()
> which is often run from interrupt handler, and the CPU serving the interrupt
> can be different with the submitting CPU for rq->mq_ctx. And there can be
> several CPUs handling the interrupts originating from same sw queue.
>
> BTW, I don't object to this patch actually, but maybe we should add
> comment about this kind of race. Cause in the future someone might find
> the statistics becomes not accurate, and they may understand or
> try to improve.

I'm fine with documenting it. For the two use cases I have, I'm fine
with it not being 100% stable. For by far the majority of the windows,
it'll be just fine. I'll fix the divide-by-zero, though.

-- 
Jens Axboe

  reply	other threads:[~2016-11-05 20:50 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-01 21:05 [PATCHSET] block: IO polling improvements Jens Axboe
2016-11-01 21:05 ` [PATCH 1/4] block: add scalable completion tracking of requests Jens Axboe
2016-11-01 22:25   ` Johannes Thumshirn
2016-11-02  5:37     ` Jens Axboe
2016-11-02 14:52   ` Christoph Hellwig
2016-11-02 14:55     ` Jens Axboe
2016-11-02 14:59       ` Christoph Hellwig
2016-11-03 11:17   ` Ming Lei
2016-11-03 13:38     ` Jens Axboe
2016-11-03 14:57       ` Ming Lei
2016-11-03 16:55         ` Jens Axboe
2016-11-04 23:13           ` Ming Lei
2016-11-05 20:49             ` Jens Axboe [this message]
2016-11-05 20:59             ` Jens Axboe
2016-11-03 14:10   ` Bart Van Assche
2016-11-03 14:18     ` Jens Axboe
2016-11-01 21:05 ` [PATCH 2/4] block: move poll code to blk-mq Jens Axboe
2016-11-02 14:54   ` Christoph Hellwig
2016-11-01 21:05 ` [PATCH 3/4] blk-mq: implement hybrid poll mode for sync O_DIRECT Jens Axboe
2016-11-02 14:54   ` Christoph Hellwig
2016-11-03 12:27   ` Ming Lei
2016-11-03 13:41     ` Jens Axboe
2016-11-03 14:01   ` Bart Van Assche
2016-11-03 14:15     ` Jens Axboe
2016-11-01 21:05 ` [PATCH 4/4] blk-mq: make the polling code adaptive Jens Axboe
2016-11-02 14:51 ` [PATCHSET] block: IO polling improvements Christoph Hellwig
2016-11-02 14:54   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c0444b09-3f50-cb36-52f2-6e3860c9f0b7@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tom.leiming@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).