All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Hans van Kranenburg <hans@knorrie.org>, linux-btrfs@vger.kernel.org
Subject: Re: Debugging abysmal write performance with 100% cpu kworker/u16:X+flush-btrfs-2
Date: Wed, 29 Jul 2020 08:15:44 +0800	[thread overview]
Message-ID: <66efcce0-b06c-a6b7-e99f-60be81cb0bd1@gmx.com> (raw)
In-Reply-To: <bfba7719-d976-d7e3-2956-f0f200de623f@knorrie.org>


[-- Attachment #1.1: Type: text/plain, Size: 3758 bytes --]



On 2020/7/28 下午10:52, Hans van Kranenburg wrote:
> On 7/28/20 3:52 AM, Qu Wenruo wrote:
>>
>>
>> On 2020/7/28 上午8:51, Qu Wenruo wrote:
>>>
>>>
>>> On 2020/7/28 上午1:17, Hans van Kranenburg wrote:
>>>> Hi!
>>>>
>>>> [...]
>>>>>
>>>>> Since it's almost a dead CPU burner loop, regular sleep based lockup
>>>>> detector won't help much.
>>>>
>>>> Here's a flame graph of 180 seconds, taken from the kernel thread pid:
>>>>
>>>> https://syrinx.knorrie.org/~knorrie/btrfs/keep/2020-07-27-perf-kworker-flush-btrfs.svg
>>>
>>> That's really awesome!
>>>
>>>>
>>>>> You can try trace events first to see which trace event get executed the
>>>>> most frequently, then try to add probe points to pin down the real cause.
>>>>
>>>> From the default collection, I already got the following, a few days
>>>> ago, by enabling find_free_extent and btrfs_cow_block:
>>>>
>>>> https://syrinx.knorrie.org/~knorrie/btrfs/keep/2020-07-25-find_free_extent.txt
>>
>> This output is in fact pretty confusing, and maybe give you a false view
>> on the callers of find_free_extent().
>>
>> It always shows "EXTENT_TREE" as the owner, but that's due to a bad
>> decision on the trace event.
>>
>> I have submitted a patch addressing it, and added you to the CC.
> 
> Oh right, thanks, that actually makes a lot of sense, lol.
> 
> I was misled because at first sight I was thinking, "yeah, obviously,
> where else than in the extent tree are you going to do administration
> about allocated blocks.", and didn't realize yet it did make no sense.
> 
>> Would you mind to re-capture the events with that patch?
>> So that we could have a clearer idea on which tree is having the most
>> amount of concurrency?
> 
> Yes. I will do that and try to reproduce the symptoms with as few
> actions in parallel as possible.
> 
> I've been away most of the day today, I will see how far I get later,
> otherwise continue tomorrow.
> 
> What you *can* see in the current output already, however, is that the
> kworker/u16:3-13887 thing is doing all DATA work, while many different
> processes (rsync, btrfs receive) all do the find_free_extent work for
> METADATA. That's already an interesting difference.

That's common, don't forget that btrfs is doing COW to modify its metadata.
After a data write reaches disk, btrfs will then modify metadata to
reflect that write, which will cause a metadata reserve.

The good news is, btrfs is not always doing metadata COW, if the needed
tree blocks has already been CoWed and not yet written to disk.

So in theory, the data reserve should be more frequently than metadata.

> 
> So, one direction to look into is who is all trying to grab that spin
> lock, since if it's per 'space' (which sounds logical, since the workers
> will never clash because a whole block group belongs to only 1 'space'),
> then I don't see why kworker/u16:18-11336 would spend 1/3 of it's time
> in a busy locking situation waiting, while it's the only process working
> on METADATA.
> 
> But, I'll gather some more logs and pictures.
> 
> Do you know (some RTFM pointer?) about a way to debug who's locking on
> the same thing? I didn't research that yet.

I had a bcc based script to do that analyze, but that's mostly for tree
blocks.
https://github.com/adam900710/btrfs-profiler/blob/master/tree_lock_wait.py

Feel free to use that as a base to develop your own tool.


BTW, this data allocation race reminds me that, Josef and Nikolay
discussed this possible problem before, but not yet have a solid
solution to it.

So I guess what we can do now is just to collect enough data to proof
it's already a big problem.

Thanks,
Qu
> 
> Hans
> 
>> [...]


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

      reply	other threads:[~2020-07-29  0:15 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-25 14:24 Debugging abysmal write performance with 100% cpu kworker/u16:X+flush-btrfs-2 Hans van Kranenburg
2020-07-25 15:37 ` Holger Hoffstätte
2020-07-25 16:43   ` Hans van Kranenburg
2020-07-25 19:44     ` Holger Hoffstätte
2020-07-25 21:03       ` Hans van Kranenburg
2020-07-26  1:00         ` Chris Murphy
2020-07-25 21:27 ` Hans van Kranenburg
2020-07-26  8:10   ` A L
2020-07-26  0:50 ` Chris Murphy
2020-07-27 11:09 ` Qu Wenruo
2020-07-27 17:17   ` Hans van Kranenburg
2020-07-27 19:23     ` Chris Murphy
2020-07-27 23:16     ` Chris Murphy
2020-07-28  0:51     ` Qu Wenruo
2020-07-28  1:52       ` Qu Wenruo
2020-07-28 14:52         ` Hans van Kranenburg
2020-07-29  0:15           ` Qu Wenruo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66efcce0-b06c-a6b7-e99f-60be81cb0bd1@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=hans@knorrie.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.