linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Valente <paolo.valente@linaro.org>
To: Jan Kara <jack@suse.cz>
Cc: "Jens Axboe" <axboe@kernel.dk>,
	linux-block@vger.kernel.org, "Michal Koutný" <mkoutny@suse.com>
Subject: Re: [PATCH 0/3 v2] bfq: Limit number of allocated scheduler tags per cgroup
Date: Fri, 27 Aug 2021 12:07:20 +0200	[thread overview]
Message-ID: <751F4AB5-1FDF-45B0-88E1-0C76ED1AAAD6@linaro.org> (raw)
In-Reply-To: <20210715132047.20874-1-jack@suse.cz>



> Il giorno 15 lug 2021, alle ore 15:30, Jan Kara <jack@suse.cz> ha scritto:
> 
> Hello!
> 

Hi!

> Here is the second revision of my patches to fix how bfq weights apply on
> cgroup throughput.

I don't remember whether I replied to your first version.  Anyway,
thanks for this important contribution.

> This version has only one change fixing how we compute
> number of tags that should be available to a cgroup. Previous version didn't
> combine weights at several levels correctly for deeper hierarchies. It is
> somewhat unfortunate that for really deep cgroup hierarchies we would now do
> memory allocation inside bfq_limit_depth(). I have an idea how we could avoid
> that if the rest of the approach proves OK so don't concentrate too much on
> that detail please.
> 
> Changes since v1:
> * Fixed computation of appropriate proportion of scheduler tags for a cgroup
>  to work with deeper cgroup hierarchies.
> 
> Original cover letter:
> 
> I was looking into why cgroup weights do not have any measurable impact on
> writeback throughput from different cgroups. This actually a regression from
> CFQ where things work more or less OK and weights have roughly the impact they
> should. The problem can be reproduced e.g. by running the following easy fio
> job in two cgroups with different weight:
> 
> [writer]
> directory=/mnt/repro/
> numjobs=1
> rw=write
> size=8g
> time_based
> runtime=30
> ramp_time=10
> blocksize=1m
> direct=0
> ioengine=sync
> 
> I can observe there's no significat difference in the amount of data written
> from different cgroups despite their weights are in say 1:3 ratio.
> 
> After some debugging I've understood the dynamics of the system. There are two
> issues:
> 
> 1) The amount of scheduler tags needs to be significantly larger than the
> amount of device tags. Otherwise there are not enough requests waiting in BFQ
> to be dispatched to the device and thus there's nothing to schedule on.
> 

Before discussing your patches in detail, I need a little help on this
point.  You state that the number of scheduler tags must be larger
than the number of device tags.  So, I expected some of your patches
to address somehow this issue, e.g., by increasing the number of
scheduler tags.  Yet I have not found such a change.  Did I miss
something?

Thanks,
Paolo

> 2) Even with enough scheduler tags, writers from two cgroups eventually start
> contending on scheduler tag allocation. These are served on first come first
> served basis so writers from both cgroups feed requests into bfq with
> approximately the same speed. Since bfq prefers IO from heavier cgroup, that is
> submitted and completed faster and eventually we end up in a situation when
> there's no IO from the heavier cgroup in bfq and all scheduler tags are
> consumed by requests from the lighter cgroup. At that point bfq just dispatches
> lots of the IO from the lighter cgroup since there's no contender for disk
> throughput. As a result observed throughput for both cgroups are the same.
> 
> This series fixes this problem by accounting how many scheduler tags are
> allocated for each cgroup and if a cgroup has more tags allocated than its
> fair share (based on weights) in its service tree, we heavily limit scheduler
> tag bitmap depth for it so that it is not be able to starve other cgroups from
> scheduler tags.
> 
> What do people think about this?
> 
> 								Honza
> 
> Previous versions:
> Link: http://lore.kernel.org/r/20210712171146.12231-1-jack@suse.cz # v1


  parent reply	other threads:[~2021-08-27 10:07 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-15 13:30 [PATCH 0/3 v2] bfq: Limit number of allocated scheduler tags per cgroup Jan Kara
2021-07-15 13:30 ` [PATCH 1/3] block: Provide icq in request allocation data Jan Kara
2021-07-15 13:30 ` [PATCH 2/3] bfq: Track number of allocated requests in bfq_entity Jan Kara
2021-07-15 13:30 ` [PATCH 3/3] bfq: Limit number of requests consumed by each cgroup Jan Kara
2021-08-27 10:07 ` Paolo Valente [this message]
2021-08-31  9:59   ` [PATCH 0/3 v2] bfq: Limit number of allocated scheduler tags per cgroup Michal Koutný
2021-09-15 13:15     ` Jan Kara
2021-09-18 10:58       ` Paolo Valente
2021-09-20  9:28         ` Jan Kara
2021-09-22 14:33           ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=751F4AB5-1FDF-45B0-88E1-0C76ED1AAAD6@linaro.org \
    --to=paolo.valente@linaro.org \
    --cc=axboe@kernel.dk \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=mkoutny@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).