From: Thorsten Leemhuis <regressions@leemhuis.info>
To: "regressions@lists.linux.dev" <regressions@lists.linux.dev>
Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch #forregzbot
Date: Sun, 16 Jan 2022 08:30:05 +0100 [thread overview]
Message-ID: <25e38df8-729f-df8c-4d20-b39662d4d3b3@leemhuis.info> (raw)
In-Reply-To: <a61df03d-2e36-9e91-ff02-2f48eb660181@leemhuis.info>
For the record: the initial bisection might have been slightly mislead
and it's hard to find the real culprit. Let regzbot know about this:
#regzbot introduced v5.15..v5.16-rc1
People are working on this regression, but it will take a while to get
things sorted.
TWIMC: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.
On 30.11.21 08:16, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker speaking.
>
> Top-posting for once, to make this easy accessible to everyone.
>
> Adding the regression mailing list to the list of recipients, as it
> should be in the loop for all regressions, as explained here:
> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
>
> To be sure this issue doesn't fall through the cracks unnoticed, I'm
> adding it to regzbot, my Linux kernel regression tracking bot:
>
> #regzbot ^introduced 7fd7a9e0caba
> #regzbot ignore-activity
>
> Reminder: when fixing the issue, please add a 'Link:' tag with the URL
> to the report (the parent of this mail), then regzbot will automatically
> mark the regression as resolved once the fix lands in the appropriate
> tree. For more details about regzbot see footer.
>
> Sending this to everyone that got the initial report, to make all aware
> of the tracking. I also hope that messages like this motivate people to
> directly get at least the regression mailing list and ideally even
> regzbot involved when dealing with regressions, as messages like this
> wouldn't be needed then.
>
> Don't worry, I'll send further messages wrt to this regression just to
> the lists (with a tag in the subject so people can filter them away), as
> long as they are intended just for regzbot. With a bit of luck no such
> messages will be needed anyway.
>
> Ciao, Thorsten, your Linux kernel regression tracker.
>
> ---
> Additional information about regzbot:
>
> If you want to know more about regzbot, check out its web-interface, the
> getting start guide, and/or the references documentation:
>
> https://linux-regtracking.leemhuis.info/regzbot/
> https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
> https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md
>
> The last two documents will explain how you can interact with regzbot
> yourself if your want to.
>
> Hint for reporters: when reporting a regression it's in your interest to
> tell #regzbot about it in the report, as that will ensure the regression
> gets on the radar of regzbot and the regression tracker. That's in your
> interest, as they will make sure the report won't fall through the
> cracks unnoticed.
>
> Hint for developers: you normally don't need to care about regzbot once
> it's involved. Fix the issue as you normally would, just remember to
> include a 'Link:' tag to the report in the commit message, as explained
> in Documentation/process/submitting-patches.rst
> That aspect was recently was made more explicit in commit 1f57bd42b77c:
> https://git.kernel.org/linus/1f57bd42b77c
>
>
> On 29.11.21 18:03, Josef Bacik wrote:
>>
>> Our nightly performance testing found a performance regression when we rebased
>> our devel tree onto v5.16-rc. This took me a few days to bisect down, but this
>> patch
>>
>> 7fd7a9e0caba ("sched/fair: Trigger nohz.next_balance updates when a CPU goes NOHZ-idle")
>>
>> is the one that introduces the regression. My performance testing box is a 2
>> socket, with a model name "Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz", for a
>> total of 12 cpu's reported in cpuinfo. It has 128gib of RAM, and these perf
>> tests are being run against a SSD and spinning rust device, but the regression
>> is consistent across both configurations. You can see the historical graph of
>> the completion latencies for this specific run
>>
>> http://toxicpanda.com/performance/emptyfiles500k_write_clat_ns_p99.png
>>
>> Or for something a little more braindead (untar firefox) you can see a increase
>> in the runtime
>>
>> http://toxicpanda.com/performance/untarfirefox_elapsed.png
>>
>> These two tests are single threaded, the regression doesn't appear to affect
>> multi-threaded tests. For a simple reproducer you can simply download a tarball
>> of the firefox sources and untar it onto a clean btrfs file system. The time
>> before and after this commit goes up ~1-2 seconds on my machine. For a less
>> simple test you can create a clean btrfs file system and run
>>
>> fio --name emptyfiles500k --create_on_open=1 --nrfiles=31250 --readwrite=write \
>> --readwrite=write --ioengine=filecreate --fallocate=none --filesize=4k \
>> --openfiles=1 --alloc-size 98304 --allrandrepeat=1 --randseed=12345 \
>> --directory <mount point>
>>
>> And you are looking for the "Write clat ns p99" metric. You'll see a 5-10%
>> increase in the latency time. If you want to run our tests directly it's
>> relatively easy to setup, you can clone the fsperf repo
>>
>> https://github.com/josefbacik/fsperf
>>
>> Then in the fsperf directory edit the local.cfg and add
>>
>> [main]
>> directory=/mnt/test
>>
>> [btrfs]
>> device=/dev/sdc
>> iosched=none
>> mkfs=mkfs.btrfs -f
>> mount=mount -o noatime
>>
>> And then run the following on the baseline kernel
>>
>> ./fsperf -p regression -c btrfs -n 10 emptyfiles500k
>>
>> This will run the test 10 times and save the results to the database. Then you
>> can boot into your changed kernel and runn
>>
>> ./fsperf -p regrssion -c btrfs -n 10 -t emptyfiles500k
>>
>> This will run the test 10 times and take the average and compare it to the
>> baseline and print out the values, you'll see the increase latency values there.
>>
>> I can reproduce this at will, if you want to just throw patches at me I'm happy
>> to run it and let you know what happens. I'm attaching my .config as well in
>> case that is needed, but the HZ and PREEMPT settings are
>>
>> CONFIG_NO_HZ_COMMON=y
>> CONFIG_NO_HZ_FULL=y
>> CONFIG_NO_HZ=y
>> CONFIG_HZ_1000=y
>> CONFIG_PREEMPT=y
>> CONFIG_PREEMPT_COUNT=y
>> CONFIG_PREEMPTION=y
>> CONFIG_PREEMPT_DYNAMIC=y
>> CONFIG_PREEMPT_RCU=y
>> CONFIG_HAVE_PREEMPT_DYNAMIC=y
>> CONFIG_PREEMPT_NOTIFIERS=y
>> CONFIG_DEBUG_PREEMPT=y
>>
>> Thanks,
>>
>> Josef
>>
>
prev parent reply other threads:[~2022-01-16 7:30 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <YaUH5GFFoLiS4/3/@localhost.localdomain>
2021-11-30 7:16 ` [REGRESSION] 5-10% increase in IO latencies with nohz balance patch Thorsten Leemhuis
2022-01-16 7:30 ` Thorsten Leemhuis [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=25e38df8-729f-df8c-4d20-b39662d4d3b3@leemhuis.info \
--to=regressions@leemhuis.info \
--cc=regressions@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).