All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Ming Lei <ming.lei@redhat.com>, Rachit Agarwal <rach4x0r@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>, Qizhe Cai <qc228@cornell.edu>,
	Rachit Agarwal <ragarwal@cornell.edu>,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org,
	Midhul Vuppalapati <mvv25@cornell.edu>,
	Jaehyun Hwang <jaehyun.hwang@cornell.edu>,
	Rachit Agarwal <ragarwal@cs.cornell.edu>,
	Keith Busch <kbusch@kernel.org>,
	Sagi Grimberg <sagi@lightbitslabs.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH] iosched: Add i10 I/O Scheduler
Date: Fri, 13 Nov 2020 12:58:10 -0800	[thread overview]
Message-ID: <44d5bcb0-689e-50c8-fa8e-a7d2b569f75c@grimberg.me> (raw)
In-Reply-To: <20201113145912.GA1074955@T590>


> blk-mq actually has built-in batching(or sort of) mechanism, which is enabled
> if the hw queue is busy(hctx->dispatch_busy is > 0). We use EWMA to compute
> hctx->dispatch_busy, and it is adaptive, even though the implementation is quite
> coarse. But there should be much space to improve, IMO.

You are correct, however nvme-tcp should be getting to dispatch_busy > 0
IIUC.

> It is reported that this way improves SQ high-end SCSI SSD very much[1],
> and MMC performance gets improved too[2].
> 
> [1] https://lore.kernel.org/linux-block/3cc3e03901dc1a63ef32e036182521af@mail.gmail.com/
> [2] https://lore.kernel.org/linux-block/CADBw62o9eTQDJ9RvNgEqSpXmg6Xcq=2TxH0Hfxhp29uF2W=TXA@mail.gmail.com/

Yes, the guys paid attention to the MMC related improvements that you
made.

>> The i10 I/O scheduler builds upon recent work on [6]. We have tested the i10 I/O
>> scheduler with nvme-tcp optimizaitons [2,3] and batching dispatch [4], varying number
>> of cores, varying read/write ratios, and varying request sizes, and with NVMe SSD and
>> RAM block device. For NVMe SSDs, the i10 I/O scheduler achieves ~60% improvements in
>> terms of IOPS per core over "noop" I/O scheduler. These results are available at [5],
>> and many additional results are presented in [6].
> 
> In case of none scheduler, basically nvme driver won't provide any queue busy
> feedback, so the built-in batching dispatch doesn't work simply.

Exactly.

> kyber scheduler uses io latency feedback to throttle and build io batch,
> can you compare i10 with kyber on nvme/nvme-tcp?

I assume it should be simple to get, I'll let Rachit/Jaehyun comment.

>> While other schedulers may also batch I/O (e.g., mq-deadline), the optimization target
>> in the i10 I/O scheduler is throughput maximization. Hence there is no latency target
>> nor a need for a global tracking context, so a new scheduler is needed rather than
>> to build this functionality to an existing scheduler.
>>
>> We currently use fixed default values as batching thresholds (e.g., 16 for #requests,
>> 64KB for #bytes, and 50us for timeout). These default values are based on sensitivity
>> tests in [6]. For our future work, we plan to support adaptive batching according to
> 
> Frankly speaking, hardcode 16 #rquests or 64KB may not work everywhere,
> and product environment could be much complicated than your sensitivity
> tests. If possible, please start with adaptive batching.

That was my feedback as well for sure. But given that this is a
scheduler one would opt-in to anyway, that won't be a must-have
initially. I'm not sure if the guys made progress with this yet, I'll
let them comment.

WARNING: multiple messages have this Message-ID (diff)
From: Sagi Grimberg <sagi@grimberg.me>
To: Ming Lei <ming.lei@redhat.com>, Rachit Agarwal <rach4x0r@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	Rachit Agarwal <ragarwal@cornell.edu>,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	Qizhe Cai <qc228@cornell.edu>,
	Midhul Vuppalapati <mvv25@cornell.edu>,
	Jaehyun Hwang <jaehyun.hwang@cornell.edu>,
	Rachit Agarwal <ragarwal@cs.cornell.edu>,
	Keith Busch <kbusch@kernel.org>,
	Sagi Grimberg <sagi@lightbitslabs.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH] iosched: Add i10 I/O Scheduler
Date: Fri, 13 Nov 2020 12:58:10 -0800	[thread overview]
Message-ID: <44d5bcb0-689e-50c8-fa8e-a7d2b569f75c@grimberg.me> (raw)
In-Reply-To: <20201113145912.GA1074955@T590>


> blk-mq actually has built-in batching(or sort of) mechanism, which is enabled
> if the hw queue is busy(hctx->dispatch_busy is > 0). We use EWMA to compute
> hctx->dispatch_busy, and it is adaptive, even though the implementation is quite
> coarse. But there should be much space to improve, IMO.

You are correct, however nvme-tcp should be getting to dispatch_busy > 0
IIUC.

> It is reported that this way improves SQ high-end SCSI SSD very much[1],
> and MMC performance gets improved too[2].
> 
> [1] https://lore.kernel.org/linux-block/3cc3e03901dc1a63ef32e036182521af@mail.gmail.com/
> [2] https://lore.kernel.org/linux-block/CADBw62o9eTQDJ9RvNgEqSpXmg6Xcq=2TxH0Hfxhp29uF2W=TXA@mail.gmail.com/

Yes, the guys paid attention to the MMC related improvements that you
made.

>> The i10 I/O scheduler builds upon recent work on [6]. We have tested the i10 I/O
>> scheduler with nvme-tcp optimizaitons [2,3] and batching dispatch [4], varying number
>> of cores, varying read/write ratios, and varying request sizes, and with NVMe SSD and
>> RAM block device. For NVMe SSDs, the i10 I/O scheduler achieves ~60% improvements in
>> terms of IOPS per core over "noop" I/O scheduler. These results are available at [5],
>> and many additional results are presented in [6].
> 
> In case of none scheduler, basically nvme driver won't provide any queue busy
> feedback, so the built-in batching dispatch doesn't work simply.

Exactly.

> kyber scheduler uses io latency feedback to throttle and build io batch,
> can you compare i10 with kyber on nvme/nvme-tcp?

I assume it should be simple to get, I'll let Rachit/Jaehyun comment.

>> While other schedulers may also batch I/O (e.g., mq-deadline), the optimization target
>> in the i10 I/O scheduler is throughput maximization. Hence there is no latency target
>> nor a need for a global tracking context, so a new scheduler is needed rather than
>> to build this functionality to an existing scheduler.
>>
>> We currently use fixed default values as batching thresholds (e.g., 16 for #requests,
>> 64KB for #bytes, and 50us for timeout). These default values are based on sensitivity
>> tests in [6]. For our future work, we plan to support adaptive batching according to
> 
> Frankly speaking, hardcode 16 #rquests or 64KB may not work everywhere,
> and product environment could be much complicated than your sensitivity
> tests. If possible, please start with adaptive batching.

That was my feedback as well for sure. But given that this is a
scheduler one would opt-in to anyway, that won't be a must-have
initially. I'm not sure if the guys made progress with this yet, I'll
let them comment.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-11-13 20:58 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-12 14:07 [PATCH] iosched: Add i10 I/O Scheduler Rachit Agarwal
2020-11-12 14:07 ` Rachit Agarwal
2020-11-12 18:02 ` Jens Axboe
2020-11-12 18:02   ` Jens Axboe
2020-11-13 20:34   ` Sagi Grimberg
2020-11-13 20:34     ` Sagi Grimberg
2020-11-13 21:03     ` Jens Axboe
2020-11-13 21:03       ` Jens Axboe
2020-11-13 21:23       ` Sagi Grimberg
2020-11-13 21:23         ` Sagi Grimberg
2020-11-13 21:26         ` Jens Axboe
2020-11-13 21:26           ` Jens Axboe
2020-11-13 21:36           ` Sagi Grimberg
2020-11-13 21:36             ` Sagi Grimberg
2020-11-13 21:44             ` Jens Axboe
2020-11-13 21:44               ` Jens Axboe
2020-11-13 21:56               ` Sagi Grimberg
2020-11-13 21:56                 ` Sagi Grimberg
     [not found]                 ` <CAKeUqKKHg1wD19pnwJEd8whubnuGVic_ZhDjebaq3kKmY9TtsQ@mail.gmail.com>
2020-11-30 19:20                   ` Sagi Grimberg
2020-11-30 19:20                     ` Sagi Grimberg
2020-12-28 14:28                   ` Rachit Agarwal
2021-01-11 18:15                     ` Rachit Agarwal
2021-01-11 18:15                       ` Rachit Agarwal
2020-11-16  8:41             ` Ming Lei
2020-11-16  8:41               ` Ming Lei
2020-11-13 14:59 ` Ming Lei
2020-11-13 14:59   ` Ming Lei
2020-11-13 20:58   ` Sagi Grimberg [this message]
2020-11-13 20:58     ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44d5bcb0-689e-50c8-fa8e-a7d2b569f75c@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=jaehyun.hwang@cornell.edu \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=ming.lei@redhat.com \
    --cc=mvv25@cornell.edu \
    --cc=qc228@cornell.edu \
    --cc=rach4x0r@gmail.com \
    --cc=ragarwal@cornell.edu \
    --cc=ragarwal@cs.cornell.edu \
    --cc=sagi@lightbitslabs.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.