linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Randy Dunlap <rdunlap@infradead.org>
To: Rachit Agarwal <rach4x0r@gmail.com>, Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@lst.de>
Cc: Rachit Agarwal <ragarwal@cornell.edu>,
	linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-kernel@vger.kernel.org, Keith Busch <kbusch@kernel.org>,
	Ming Lei <ming.lei@redhat.com>,
	Jaehyun Hwang <jaehyun.hwang@cornell.edu>,
	Qizhe Cai <qc228@cornell.edu>,
	Midhul Vuppalapati <mvv25@cornell.edu>,
	Sagi Grimberg <sagi@lightbitslabs.com>,
	Shrijeet Mukherjee <shrijeet@gmail.com>,
	David Ahern <dsahern@gmail.com>
Subject: Re: [PATCH v2] iosched: Add i10 I/O Scheduler
Date: Fri, 4 Dec 2020 12:01:18 -0800	[thread overview]
Message-ID: <593f6a3b-6e78-e4b3-c808-b9e452e6d05b@infradead.org> (raw)
In-Reply-To: <20201130201927.84846-1-rach4x0r@gmail.com>

On 11/30/20 12:19 PM, Rachit Agarwal wrote:
> From: Rachit Agarwal <ragarwal@cornell.edu>
> 

Hi,  {reusing bits}

> ---
>  Documentation/block/i10-iosched.rst |  79 ++++++
>  block/Kconfig.iosched               |   8 +
>  block/Makefile                      |   1 +
>  block/i10-iosched.c                 | 471 ++++++++++++++++++++++++++++++++++++
>  4 files changed, 559 insertions(+)
>  create mode 100644 Documentation/block/i10-iosched.rst
>  create mode 100644 block/i10-iosched.c


> diff --git a/Documentation/block/i10-iosched.rst b/Documentation/block/i10-iosched.rst
> new file mode 100644
> index 0000000..661b5d5
> --- /dev/null
> +++ b/Documentation/block/i10-iosched.rst
> @@ -0,0 +1,79 @@
> +==========================
> +i10 I/O scheduler overview
> +==========================
> +
> +I/O batching is beneficial for optimizing IOPS and throughput for various
> +applications. For instance, several kernel block drivers would benefit from
> +batching, including mmc [1] and tcp-based storage drivers like nvme-tcp [2,3].

                       MMC         TCP-based

> +While we have support for batching dispatch [4], we need an I/O scheduler to
> +efficiently enable batching. Such a scheduler is particularly interesting for
> +disaggregated (remote) storage, where the access latency of disaggregated remote
> +storage may be higher than local storage access; thus, batching can significantly
> +help in amortizing the remote access latency while increasing the throughput.
> +
> +This patch introduces the i10 I/O scheduler, which performs batching per hctx in
> +terms of #requests, #bytes, and timeouts (at microseconds granularity). i10 starts
> +dispatching only when #requests or #bytes is larger than a threshold or when a timer
> +expires. After that, batching dispatch [3] would happen, allowing batching at device
> +drivers along with "bd->last" and ".commit_rqs".
> +
> +The i10 I/O scheduler builds upon recent work on [6]. We have tested the i10 I/O
> +scheduler with nvme-tcp optimizaitons [2,3] and batching dispatch [4], varying number

                           optimizations

> +of cores, varying read/write ratios, and varying request sizes, and with NVMe SSD and
> +RAM block device. For remote NVMe SSDs, the i10 I/O scheduler achieves ~60% improvements
> +in terms of IOPS per core over "noop" I/O scheduler, while trading off latency at lower loads.
> +These results are available at [5], and many additional results are presented in [6].
> +
> +While other schedulers may also batch I/O (e.g., mq-deadline), the optimization target
> +in the i10 I/O scheduler is throughput maximization. Hence there is no latency target
> +nor a need for a global tracking context, so a new scheduler is needed rather than
> +to build this functionality to an existing scheduler.
> +
> +We have default values for batching thresholds (e.g., 16 for #requests, 64KB for #bytes,
> +and 50us for timeout). These default values are based on sensitivity tests in [6].
> +For many workloads, especially those with low loads, the default values of i10 scheduler
> +may not provide the optimal operating point on the latency-throughput curve. To that end,
> +the scheduler adaptively sets the batch size depending on number of outstanding requests
> +and the triggering of timeouts, as measured in the block layer. Much work needs to be done
> +to design better adaptation algorithms, especially when the loads are neither too high
> +nor too low. This constitutes interesting future work. In addition, for our future work, we
> +plan to extend the scheduler to support isolation in multi-tenant deployments
> +(to simultaneously achieve low tail latency for latency-sensitive applications and high
> +throughput for throughput-bound applications).
> +
> +References
> +[1] https://lore.kernel.org/linux-block/cover.1587888520.git.baolin.wang7@gmail.com/T/#mc48a8fb6069843827458f5fea722e1179d32af2a
> +[2] https://git.infradead.org/nvme.git/commit/122e5b9f3d370ae11e1502d14ff5c7ea9b144a76
> +[3] https://git.infradead.org/nvme.git/commit/86f0348ace1510d7ac25124b096fb88a6ab45270
> +[4] https://lore.kernel.org/linux-block/20200630102501.2238972-1-ming.lei@redhat.com/
> +[5] https://github.com/i10-kernel/upstream-linux/blob/master/i10-evaluation.pdf
> +[6] https://www.usenix.org/conference/nsdi20/presentation/hwang
> +
> +==========================
> +i10 I/O scheduler tunables
> +==========================

[snip]


thanks.
-- 
~Randy


      reply	other threads:[~2020-12-04 20:02 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-30 20:19 [PATCH v2] iosched: Add i10 I/O Scheduler Rachit Agarwal
2020-12-04 20:01 ` Randy Dunlap [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=593f6a3b-6e78-e4b3-c808-b9e452e6d05b@infradead.org \
    --to=rdunlap@infradead.org \
    --cc=axboe@kernel.dk \
    --cc=dsahern@gmail.com \
    --cc=hch@lst.de \
    --cc=jaehyun.hwang@cornell.edu \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=ming.lei@redhat.com \
    --cc=mvv25@cornell.edu \
    --cc=qc228@cornell.edu \
    --cc=rach4x0r@gmail.com \
    --cc=ragarwal@cornell.edu \
    --cc=sagi@lightbitslabs.com \
    --cc=shrijeet@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).