Re: [PATCH v3 00/14] bfq: introduce bfq.ioprio for cgroup

From: Tejun Heo <tj@kernel.org>
To: brookxu <brookxu.cn@gmail.com>
Cc: paolo.valente@linaro.org, axboe@kernel.dk,
	linux-block@vger.kernel.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 00/14] bfq: introduce bfq.ioprio for cgroup
Date: Sun, 4 Apr 2021 12:09:29 -0400	[thread overview]
Message-ID: <YGnkuWYKeK7C8/Za@mtj.duckdns.org> (raw)
In-Reply-To: <cover.1616649216.git.brookxu@tencent.com>

Hello,

On Thu, Mar 25, 2021 at 02:57:44PM +0800, brookxu wrote:
> INTERFACE:
> 
> The bfq.ioprio interface now is available for cgroup v1 and cgroup
> v2. Users can configure the ioprio for cgroup through this
> interface, as shown below:
> 
> echo "1 2"> blkio.bfq.ioprio
> 
> The above two values respectively represent the values of ioprio
> class and ioprio for cgroup.
> 
> EXPERIMENT:
> 
> The test process is as follows:
> # prepare data disk
> mount /dev/sdb /data1
> 
> # prepare IO scheduler
> echo bfq > /sys/block/sdb/queue/scheduler
> echo 0 > /sys/block/sdb/queue/iosched/low_latency
> echo 1 > /sys/block/sdb/queue/iosched/better_fairness
> 
> It is worth noting here that nr_requests limits the number of
> requests, and it does not perceive priority. If nr_requests is
> too small, it may cause a serious priority inversion problem.
> Therefore, we can increase the size of nr_requests based on
> the actual situation.
> 
> # create cgroup v1 hierarchy
> cd /sys/fs/cgroup/blkio
> mkdir rt be0 be1 be2 idle
> 
> # prepare cgroup
> echo "1 0" > rt/blkio.bfq.ioprio
> echo "2 0" > be0/blkio.bfq.ioprio
> echo "2 4" > be1/blkio.bfq.ioprio
> echo "2 7" > be2/blkio.bfq.ioprio
> echo "3 0" > idle/blkio.bfq.ioprio

Here are some concerns:

* The main benefit of bfq compared to cfq at least was that the behavior
  model was defined in a clearer way. It was possible to describe what the
  control model was in a way which makes semantic sense. The main problem I
  see with this proposal is that it's an interface which grew out of the
  current implementation specifics and I'm having a hard time understanding
  what the end results should be with different configuration combinations.

* While this might work around some scheduling latency issues but I have a
  hard time imagining it being able to address actual QoS issues. e.g. on a
  lot of SSDs, without absolute throttling, device side latencies can spike
  by multiple orders of magnitude and no prioritization on the scheduler
  side is gonna help once such state is reached. Here, there's no robust
  mechanisms or measurement/control units defined to address that. In fact,
  the above direction to increase nr_requests limit will make priority
  inversions on the device and post-elevator side way more likely and
  severe.

So, maybe it helps with specific scenarios on some hardware, but given the
ad-hoc nature, I don't think it justifies all the extra interface additions.
My suggestion would be slimming it down to bare essentials and making the
user interface part as minimal as possible.

Thanks.

-- 
tejun