From: Jan Kara <jack@suse.cz>
To: Paolo Valente <paolo.valente@linaro.org>
Cc: Jan Kara <jack@suse.cz>, linux-block@vger.kernel.org
Subject: Re: False waker detection in BFQ
Date: Fri, 21 May 2021 15:10:34 +0200 [thread overview]
Message-ID: <20210521131034.GL18952@quack2.suse.cz> (raw)
In-Reply-To: <FFFA8EE2-3635-4873-9F2C-EC3206CC002B@linaro.org>
On Thu 20-05-21 17:05:45, Paolo Valente wrote:
> > Il giorno 5 mag 2021, alle ore 18:20, Jan Kara <jack@suse.cz> ha scritto:
> >
> > Hi Paolo!
> >
> > I have two processes doing direct IO writes like:
> >
> > dd if=/dev/zero of=/mnt/file$i bs=128k oflag=direct count=4000M
> >
> > Now each of these processes belongs to a different cgroup and it has
> > different bfq.weight. I was looking into why these processes do not split
> > bandwidth according to BFQ weights. Or actually the bandwidth is split
> > accordingly initially but eventually degrades into 50/50 split. After some
> > debugging I've found out that due to luck, one of the processes is decided
> > to be a waker of the other process and at that point we loose isolation
> > between the two cgroups. This pretty reliably happens sometime during the
> > run of these two processes on my test VM. So can we tweak the waker logic
> > to reduce the chances for false positives? Essentially when there are only
> > two processes doing heavy IO against the device, the logic in
> > bfq_check_waker() is such that they are very likely to eventually become
> > wakers of one another. AFAICT the only condition that needs to get
> > fulfilled is that they need to submit IO within 4 ms of the completion of
> > IO of the other process 3 times.
>
> as I happened to tell you moths ago, I feared some likely cover case
> to show up eventually. Actually, I was even more pessimistic than how
> reality proved to be :)
:)
> I'm sorry for my delay, but I've had to think about this issue for a
> while. Being too strict would easily run out journald as a waker for
> processes belonging to a different group.
>
> So, what do you think of this proposal: add the extra filter that a
> waker must belong to the same group of the woken, or, at most, to the
> root group?
I thought you will suggest that :) Well, I'd probably allow waker-wakee
relationship if the two cgroups are in 'ancestor' - 'successor'
relationship. Not necessarily only root cgroup vs some cgroup. That being
said in my opinion it is just a poor mans band aid fixing this particular
setup. It will not fix e.g. a similar problem when those two processes are
in the same cgroup but have say different IO priorities.
The question is how we could do better. But so far I have no great idea
either.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2021-05-21 13:10 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-05 16:20 False waker detection in BFQ Jan Kara
2021-05-20 15:05 ` Paolo Valente
2021-05-21 13:10 ` Jan Kara [this message]
2021-08-13 14:01 ` Jan Kara
2021-08-23 13:58 ` Paolo Valente
2021-08-23 16:06 ` Jan Kara
2021-08-25 16:43 ` Jan Kara
2021-08-26 9:45 ` Paolo Valente
2021-08-26 17:51 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210521131034.GL18952@quack2.suse.cz \
--to=jack@suse.cz \
--cc=linux-block@vger.kernel.org \
--cc=paolo.valente@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).