From: Jens Axboe <axboe@fb.com> To: Jan Kara <jack@suse.cz> Cc: axboe@kernel.dk, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, hch@lst.de Subject: Re: [PATCH 6/8] block: add scalable completion tracking of requests Date: Wed, 9 Nov 2016 09:09:35 -0700 [thread overview] Message-ID: <3c7e3183-a7e3-4219-54ca-65c9f45b6d5b@fb.com> (raw) In-Reply-To: <20161109090157.GZ32353@quack2.suse.cz> On 11/09/2016 02:01 AM, Jan Kara wrote: > On Tue 08-11-16 08:25:52, Jens Axboe wrote: >> On 11/08/2016 06:30 AM, Jan Kara wrote: >>> On Tue 01-11-16 15:08:49, Jens Axboe wrote: >>>> For legacy block, we simply track them in the request queue. For >>>> blk-mq, we track them on a per-sw queue basis, which we can then >>>> sum up through the hardware queues and finally to a per device >>>> state. >>>> >>>> The stats are tracked in, roughly, 0.1s interval windows. >>>> >>>> Add sysfs files to display the stats. >>>> >>>> Signed-off-by: Jens Axboe <axboe@fb.com> >>> >>> This patch looks mostly good to me but I have one concern: You track >>> statistics in a fixed 134ms window, stats get cleared at the beginning of >>> each window. Now this can interact with the writeback window and latency >>> settings which are dynamic and settable from userspace - so if the >>> writeback code observation window gets set larger than the stats window, >>> things become strange since you'll likely miss quite some observations >>> about read latencies. So I think you need to make sure stats window is >>> always larger than writeback window. Or actually, why do you have something >>> like stats window and don't leave clearing of statistics completely to the >>> writeback tracking code? >> >> That's a good point, and there actually used to be a comment to that >> effect in the code. I think the best solution here would be to make the >> stats code mask available somewhere, and allow a consumer of the stats >> to request a larger window. >> >> Similarly, we could make the stat window be driven by the consumer, as >> you suggest. >> >> Currently there are two pending submissions that depend on the stats >> code. One is this writeback series, and the other one is the hybrid >> polling code. The latter does not really care about the window size as >> such, since it has no monitoring window of its own, and it wants the >> auto-clearing as well. >> >> I don't mind working on additions for this, but I'd prefer if we could >> layer them on top of the existing series instead of respinning it. >> There's considerable test time on the existing patchset. Would that work >> for you? Especially collapsing the stats and wbt windows would require >> some re-architecting. > > OK, that works for me. Actually, when thinking about this, I have one more > suggestion: Do we really want to expose the wbt window as a sysfs tunable? > I guess it is good for initial experiments but longer term having the wbt > window length be a function of target read latency might be better. > Generally you want the window length to be considerably larger than the > target latency but OTOH not too large so that the algorithm can react > reasonably quickly so that suggests it could really be autotuned (and we > scale the window anyway to adapt it to current situation). That's not a bad idea, I have thought about that as well before. We don't need the window tunable, and you are right, it can be a function of the desired latency. I'll hardwire the 100msec latency window for now and get rid of the exposed tunable. It's harder to remove sysfs files once they have made it into the kernel... >>> Also as a side note - nobody currently uses the mean value of the >>> statistics. It may be faster to track just sum and count so that mean can >>> be computed on request which will be presumably much more rare than current >>> situation where we recompute the mean on each batch update. Actually, that >>> way you could get rid of the batching as well I assume. >> >> That could be opt-in as well. The poll code uses it. And fwiw, it is >> exposed through sysfs as well. > > Yeah, my point was that just doing the division in response to sysfs read > or actual request to read the average is likely going to be less expensive > than having to do it on each batch completion (actually, you seem to have > that batching code only so that you don't have to do the division too > often). Whether my suggestion is right depends on how often polling code > actually needs to read the average... The polling code currently does it for every IO... That is not ideal for other purposes, I think I'm going to work on changing that to just keep the previous window available, so we only need to read it when the stats window changes. With the batching, I don't see the division as a problem in micro benchmarks. That's why I added the batching, because it did show up before. -- Jens Axboe
WARNING: multiple messages have this Message-ID (diff)
From: Jens Axboe <axboe@fb.com> To: Jan Kara <jack@suse.cz> Cc: <axboe@kernel.dk>, <linux-kernel@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>, <linux-block@vger.kernel.org>, <hch@lst.de> Subject: Re: [PATCH 6/8] block: add scalable completion tracking of requests Date: Wed, 9 Nov 2016 09:09:35 -0700 [thread overview] Message-ID: <3c7e3183-a7e3-4219-54ca-65c9f45b6d5b@fb.com> (raw) In-Reply-To: <20161109090157.GZ32353@quack2.suse.cz> On 11/09/2016 02:01 AM, Jan Kara wrote: > On Tue 08-11-16 08:25:52, Jens Axboe wrote: >> On 11/08/2016 06:30 AM, Jan Kara wrote: >>> On Tue 01-11-16 15:08:49, Jens Axboe wrote: >>>> For legacy block, we simply track them in the request queue. For >>>> blk-mq, we track them on a per-sw queue basis, which we can then >>>> sum up through the hardware queues and finally to a per device >>>> state. >>>> >>>> The stats are tracked in, roughly, 0.1s interval windows. >>>> >>>> Add sysfs files to display the stats. >>>> >>>> Signed-off-by: Jens Axboe <axboe@fb.com> >>> >>> This patch looks mostly good to me but I have one concern: You track >>> statistics in a fixed 134ms window, stats get cleared at the beginning of >>> each window. Now this can interact with the writeback window and latency >>> settings which are dynamic and settable from userspace - so if the >>> writeback code observation window gets set larger than the stats window, >>> things become strange since you'll likely miss quite some observations >>> about read latencies. So I think you need to make sure stats window is >>> always larger than writeback window. Or actually, why do you have something >>> like stats window and don't leave clearing of statistics completely to the >>> writeback tracking code? >> >> That's a good point, and there actually used to be a comment to that >> effect in the code. I think the best solution here would be to make the >> stats code mask available somewhere, and allow a consumer of the stats >> to request a larger window. >> >> Similarly, we could make the stat window be driven by the consumer, as >> you suggest. >> >> Currently there are two pending submissions that depend on the stats >> code. One is this writeback series, and the other one is the hybrid >> polling code. The latter does not really care about the window size as >> such, since it has no monitoring window of its own, and it wants the >> auto-clearing as well. >> >> I don't mind working on additions for this, but I'd prefer if we could >> layer them on top of the existing series instead of respinning it. >> There's considerable test time on the existing patchset. Would that work >> for you? Especially collapsing the stats and wbt windows would require >> some re-architecting. > > OK, that works for me. Actually, when thinking about this, I have one more > suggestion: Do we really want to expose the wbt window as a sysfs tunable? > I guess it is good for initial experiments but longer term having the wbt > window length be a function of target read latency might be better. > Generally you want the window length to be considerably larger than the > target latency but OTOH not too large so that the algorithm can react > reasonably quickly so that suggests it could really be autotuned (and we > scale the window anyway to adapt it to current situation). That's not a bad idea, I have thought about that as well before. We don't need the window tunable, and you are right, it can be a function of the desired latency. I'll hardwire the 100msec latency window for now and get rid of the exposed tunable. It's harder to remove sysfs files once they have made it into the kernel... >>> Also as a side note - nobody currently uses the mean value of the >>> statistics. It may be faster to track just sum and count so that mean can >>> be computed on request which will be presumably much more rare than current >>> situation where we recompute the mean on each batch update. Actually, that >>> way you could get rid of the batching as well I assume. >> >> That could be opt-in as well. The poll code uses it. And fwiw, it is >> exposed through sysfs as well. > > Yeah, my point was that just doing the division in response to sysfs read > or actual request to read the average is likely going to be less expensive > than having to do it on each batch completion (actually, you seem to have > that batching code only so that you don't have to do the division too > often). Whether my suggestion is right depends on how often polling code > actually needs to read the average... The polling code currently does it for every IO... That is not ideal for other purposes, I think I'm going to work on changing that to just keep the previous window available, so we only need to read it when the stats window changes. With the batching, I don't see the division as a problem in micro benchmarks. That's why I added the batching, because it did show up before. -- Jens Axboe
next prev parent reply other threads:[~2016-11-09 16:09 UTC|newest] Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-11-01 21:08 [PATCHSET] Throttled buffered writeback Jens Axboe 2016-11-01 21:08 ` [PATCH 1/8] block: add WRITE_BACKGROUND Jens Axboe 2016-11-02 14:55 ` Christoph Hellwig 2016-11-02 16:22 ` Jens Axboe 2016-11-05 22:27 ` Jan Kara 2016-11-01 21:08 ` [PATCH 2/8] writeback: add wbc_to_write_flags() Jens Axboe 2016-11-02 14:56 ` Christoph Hellwig 2016-11-01 21:08 ` [PATCH 3/8] writeback: mark background writeback as such Jens Axboe 2016-11-02 14:56 ` Christoph Hellwig 2016-11-05 22:26 ` Jan Kara 2016-11-01 21:08 ` [PATCH 4/8] writeback: track if we're sleeping on progress in balance_dirty_pages() Jens Axboe 2016-11-02 14:57 ` Christoph Hellwig 2016-11-02 14:59 ` Jens Axboe 2016-11-02 14:59 ` Jens Axboe 2016-11-08 13:02 ` Jan Kara 2016-11-01 21:08 ` [PATCH 5/8] block: add code to track actual device queue depth Jens Axboe 2016-11-02 14:59 ` Christoph Hellwig 2016-11-02 15:02 ` Jens Axboe 2016-11-02 15:02 ` Jens Axboe 2016-11-02 16:40 ` Johannes Thumshirn 2016-11-02 16:40 ` Johannes Thumshirn 2016-11-05 22:37 ` Jan Kara 2016-11-01 21:08 ` [PATCH 6/8] block: add scalable completion tracking of requests Jens Axboe 2016-11-08 13:30 ` Jan Kara 2016-11-08 15:25 ` Jens Axboe 2016-11-09 9:01 ` Jan Kara 2016-11-09 16:09 ` Jens Axboe [this message] 2016-11-09 16:09 ` Jens Axboe 2016-11-09 19:52 ` Jens Axboe 2016-11-10 19:38 ` Jan Kara 2016-11-12 5:19 ` Jens Axboe 2016-11-01 21:08 ` [PATCH 7/8] blk-wbt: add general throttling mechanism Jens Axboe 2016-11-08 13:39 ` Jan Kara 2016-11-08 15:41 ` Jens Axboe 2016-11-09 8:40 ` Jan Kara 2016-11-09 16:07 ` Jens Axboe 2016-11-09 19:52 ` Jens Axboe 2016-11-10 19:36 ` Jan Kara 2016-11-10 0:00 ` Dave Chinner 2016-11-01 21:08 ` [PATCH 8/8] block: hook up writeback throttling Jens Axboe 2016-11-08 13:42 ` Jan Kara 2016-11-08 15:16 ` Jens Axboe -- strict thread matches above, loose matches on Subject: below -- 2016-10-26 20:52 [PATCHSET] block: buffered " Jens Axboe 2016-10-26 20:52 ` [PATCH 6/8] block: add scalable completion tracking of requests Jens Axboe 2016-09-07 14:46 [PATCH 0/8] Throttled background buffered writeback v7 Jens Axboe 2016-09-07 14:46 ` [PATCH 6/8] block: add scalable completion tracking of requests Jens Axboe 2016-08-31 17:05 [PATCHSET v6] Throttled background buffered writeback Jens Axboe 2016-08-31 17:05 ` [PATCH 6/8] block: add scalable completion tracking of requests Jens Axboe 2016-04-26 15:55 [PATCHSET v5] Make background writeback great again for the first time Jens Axboe 2016-04-26 15:55 ` [PATCH 6/8] block: add scalable completion tracking of requests Jens Axboe 2016-05-05 7:52 ` Ming Lei
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=3c7e3183-a7e3-4219-54ca-65c9f45b6d5b@fb.com \ --to=axboe@fb.com \ --cc=axboe@kernel.dk \ --cc=hch@lst.de \ --cc=jack@suse.cz \ --cc=linux-block@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.