Re: Debugging abysmal write performance with 100% cpu kworker/u16:X+flush-btrfs-2

From: Hans van Kranenburg <hans@knorrie.org>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>, linux-btrfs@vger.kernel.org
Subject: Re: Debugging abysmal write performance with 100% cpu kworker/u16:X+flush-btrfs-2
Date: Tue, 28 Jul 2020 16:52:08 +0200	[thread overview]
Message-ID: <bfba7719-d976-d7e3-2956-f0f200de623f@knorrie.org> (raw)
In-Reply-To: <5155a008-d534-18d3-5416-1c879031b45d@gmx.com>

On 7/28/20 3:52 AM, Qu Wenruo wrote:
> 
> 
> On 2020/7/28 上午8:51, Qu Wenruo wrote:
>>
>>
>> On 2020/7/28 上午1:17, Hans van Kranenburg wrote:
>>> Hi!
>>>
>>> [...]
>>>>
>>>> Since it's almost a dead CPU burner loop, regular sleep based lockup
>>>> detector won't help much.
>>>
>>> Here's a flame graph of 180 seconds, taken from the kernel thread pid:
>>>
>>> https://syrinx.knorrie.org/~knorrie/btrfs/keep/2020-07-27-perf-kworker-flush-btrfs.svg
>>
>> That's really awesome!
>>
>>>
>>>> You can try trace events first to see which trace event get executed the
>>>> most frequently, then try to add probe points to pin down the real cause.
>>>
>>> From the default collection, I already got the following, a few days
>>> ago, by enabling find_free_extent and btrfs_cow_block:
>>>
>>> https://syrinx.knorrie.org/~knorrie/btrfs/keep/2020-07-25-find_free_extent.txt
> 
> This output is in fact pretty confusing, and maybe give you a false view
> on the callers of find_free_extent().
> 
> It always shows "EXTENT_TREE" as the owner, but that's due to a bad
> decision on the trace event.
> 
> I have submitted a patch addressing it, and added you to the CC.

Oh right, thanks, that actually makes a lot of sense, lol.

I was misled because at first sight I was thinking, "yeah, obviously,
where else than in the extent tree are you going to do administration
about allocated blocks.", and didn't realize yet it did make no sense.

> Would you mind to re-capture the events with that patch?
> So that we could have a clearer idea on which tree is having the most
> amount of concurrency?

Yes. I will do that and try to reproduce the symptoms with as few
actions in parallel as possible.

I've been away most of the day today, I will see how far I get later,
otherwise continue tomorrow.

What you *can* see in the current output already, however, is that the
kworker/u16:3-13887 thing is doing all DATA work, while many different
processes (rsync, btrfs receive) all do the find_free_extent work for
METADATA. That's already an interesting difference.

So, one direction to look into is who is all trying to grab that spin
lock, since if it's per 'space' (which sounds logical, since the workers
will never clash because a whole block group belongs to only 1 'space'),
then I don't see why kworker/u16:18-11336 would spend 1/3 of it's time
in a busy locking situation waiting, while it's the only process working
on METADATA.

But, I'll gather some more logs and pictures.

Do you know (some RTFM pointer?) about a way to debug who's locking on
the same thing? I didn't research that yet.

Hans

> [...]