All of lore.kernel.org
 help / color / mirror / Atom feed
* slow btrfs with a single kworker process using 100% CPU
@ 2017-08-16  6:04 Stefan Priebe - Profihost AG
  2017-08-16  6:53 ` Marat Khalili
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-08-16  6:04 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I've one system where a single kworker process is using 100% CPU
sometimes a second process comes up with 100% CPU [btrfs-transacti]. Is
there anything i can do to get the old speed again or find the culprit?

Greets,
Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-16  6:04 slow btrfs with a single kworker process using 100% CPU Stefan Priebe - Profihost AG
@ 2017-08-16  6:53 ` Marat Khalili
  2017-08-16  8:37   ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 17+ messages in thread
From: Marat Khalili @ 2017-08-16  6:53 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, linux-btrfs

> I've one system where a single kworker process is using 100% CPU
> sometimes a second process comes up with 100% CPU [btrfs-transacti]. Is
> there anything i can do to get the old speed again or find the culprit?

1. Do you use quotas (qgroups)?

2. Do you have a lot of snapshots? Have you deleted some recently?

More info about your system would help too.

--

With Best Regards,
Marat Khalili


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-16  6:53 ` Marat Khalili
@ 2017-08-16  8:37   ` Stefan Priebe - Profihost AG
  2017-08-16  9:02     ` Konstantin V. Gavrilenko
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-08-16  8:37 UTC (permalink / raw)
  To: Marat Khalili, linux-btrfs

Am 16.08.2017 um 08:53 schrieb Marat Khalili:
>> I've one system where a single kworker process is using 100% CPU
>> sometimes a second process comes up with 100% CPU [btrfs-transacti]. Is
>> there anything i can do to get the old speed again or find the culprit?
> 
> 1. Do you use quotas (qgroups)?

No qgroups and no quota.

> 2. Do you have a lot of snapshots? Have you deleted some recently?

1413 Snapshots. I'm deleting 50 of them every night. But btrfs-cleaner
process isn't running / consuming CPU currently.

> More info about your system would help too.
Kernel is OpenSuSE Leap 42.3.

btrfs is mounted with
compress-force=zlib

btrfs is running as a raid0 on top of 4 md raid 5 devices.

Greets,
Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-16  8:37   ` Stefan Priebe - Profihost AG
@ 2017-08-16  9:02     ` Konstantin V. Gavrilenko
  2017-08-16  9:26       ` Stefan Priebe - Profihost AG
  2017-08-16 23:21       ` Peter Grandi
  0 siblings, 2 replies; 17+ messages in thread
From: Konstantin V. Gavrilenko @ 2017-08-16  9:02 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Marat Khalili, linux-btrfs

Could be similar issue as what I had recently, with the RAID5 and 256kb chunk size.

please provide more information about your RAID setup.

p.s.
you can also check the tread "Btrfs + compression = slow performance and high cpu usage"

----- Original Message -----
From: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>
To: "Marat Khalili" <mkh@rqc.ru>, linux-btrfs@vger.kernel.org
Sent: Wednesday, 16 August, 2017 10:37:43 AM
Subject: Re: slow btrfs with a single kworker process using 100% CPU

Am 16.08.2017 um 08:53 schrieb Marat Khalili:
>> I've one system where a single kworker process is using 100% CPU
>> sometimes a second process comes up with 100% CPU [btrfs-transacti]. Is
>> there anything i can do to get the old speed again or find the culprit?
> 
> 1. Do you use quotas (qgroups)?

No qgroups and no quota.

> 2. Do you have a lot of snapshots? Have you deleted some recently?

1413 Snapshots. I'm deleting 50 of them every night. But btrfs-cleaner
process isn't running / consuming CPU currently.

> More info about your system would help too.
Kernel is OpenSuSE Leap 42.3.

btrfs is mounted with
compress-force=zlib

btrfs is running as a raid0 on top of 4 md raid 5 devices.

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-16  9:02     ` Konstantin V. Gavrilenko
@ 2017-08-16  9:26       ` Stefan Priebe - Profihost AG
  2017-08-16 11:48         ` Konstantin V. Gavrilenko
  2017-08-16 23:21       ` Peter Grandi
  1 sibling, 1 reply; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-08-16  9:26 UTC (permalink / raw)
  To: Konstantin V. Gavrilenko; +Cc: Marat Khalili, linux-btrfs

Am 16.08.2017 um 11:02 schrieb Konstantin V. Gavrilenko:
> Could be similar issue as what I had recently, with the RAID5 and 256kb chunk size.
> please provide more information about your RAID setup.

Hope this helps:

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath]
[raid0] [raid10]
md0 : active raid5 sdd1[1] sdf1[4] sdc1[0] sde1[2]
      11717406720 blocks super 1.2 level 5, 512k chunk, algorithm 2
[4/4] [UUUU]
      bitmap: 6/30 pages [24KB], 65536KB chunk

md2 : active raid5 sdm1[2] sdl1[1] sdk1[0] sdn1[4]
      11717406720 blocks super 1.2 level 5, 512k chunk, algorithm 2
[4/4] [UUUU]
      bitmap: 7/30 pages [28KB], 65536KB chunk

md1 : active raid5 sdi1[2] sdg1[0] sdj1[4] sdh1[1]
      11717406720 blocks super 1.2 level 5, 512k chunk, algorithm 2
[4/4] [UUUU]
      bitmap: 7/30 pages [28KB], 65536KB chunk

md3 : active raid5 sdp1[1] sdo1[0] sdq1[2] sdr1[4]
      11717406720 blocks super 1.2 level 5, 512k chunk, algorithm 2
[4/4] [UUUU]
      bitmap: 6/30 pages [24KB], 65536KB chunk

# btrfs fi usage /vmbackup/
Overall:
    Device size:                  43.65TiB
    Device allocated:             31.98TiB
    Device unallocated:           11.67TiB
    Device missing:                  0.00B
    Used:                         30.80TiB
    Free (estimated):             12.84TiB      (min: 12.84TiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID0: Size:31.83TiB, Used:30.66TiB
   /dev/md0        7.96TiB
   /dev/md1        7.96TiB
   /dev/md2        7.96TiB
   /dev/md3        7.96TiB

Metadata,RAID0: Size:153.00GiB, Used:141.34GiB
   /dev/md0       38.25GiB
   /dev/md1       38.25GiB
   /dev/md2       38.25GiB
   /dev/md3       38.25GiB

System,RAID0: Size:128.00MiB, Used:2.28MiB
   /dev/md0       32.00MiB
   /dev/md1       32.00MiB
   /dev/md2       32.00MiB
   /dev/md3       32.00MiB

Unallocated:
   /dev/md0        2.92TiB
   /dev/md1        2.92TiB
   /dev/md2        2.92TiB
   /dev/md3        2.92TiB


Stefan

> 
> p.s.
> you can also check the tread "Btrfs + compression = slow performance and high cpu usage"
> 
> ----- Original Message -----
> From: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>
> To: "Marat Khalili" <mkh@rqc.ru>, linux-btrfs@vger.kernel.org
> Sent: Wednesday, 16 August, 2017 10:37:43 AM
> Subject: Re: slow btrfs with a single kworker process using 100% CPU
> 
> Am 16.08.2017 um 08:53 schrieb Marat Khalili:
>>> I've one system where a single kworker process is using 100% CPU
>>> sometimes a second process comes up with 100% CPU [btrfs-transacti]. Is
>>> there anything i can do to get the old speed again or find the culprit?
>>
>> 1. Do you use quotas (qgroups)?
> 
> No qgroups and no quota.
> 
>> 2. Do you have a lot of snapshots? Have you deleted some recently?
> 
> 1413 Snapshots. I'm deleting 50 of them every night. But btrfs-cleaner
> process isn't running / consuming CPU currently.
> 
>> More info about your system would help too.
> Kernel is OpenSuSE Leap 42.3.
> 
> btrfs is mounted with
> compress-force=zlib
> 
> btrfs is running as a raid0 on top of 4 md raid 5 devices.
> 
> Greets,
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-16  9:26       ` Stefan Priebe - Profihost AG
@ 2017-08-16 11:48         ` Konstantin V. Gavrilenko
  2017-08-16 12:00           ` Roman Mamedov
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin V. Gavrilenko @ 2017-08-16 11:48 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Marat Khalili, linux-btrfs, Peter Grandi



I believe the chunk size of 512kb is even worth for performance then the default settings on my HW RAID of  256kb.

Peter Grandi explained it earlier on in one of his posts.

QTE
++++++
That runs counter to this simple story: suppose a program is
doing 64KiB IO:

* For *reads*, there are 4 data drives and the strip size is
  16KiB: the 64KiB will be read in parallel on 4 drives. If the
  strip size is 256KiB then the 64KiB will be read sequentially
  from just one disk, and 4 successive reads will be read
  sequentially from the same drive.

* For *writes* on a parity RAID like RAID5 things are much, much
  more extreme: the 64KiB will be written with 16KiB strips on a
  5-wide RAID5 set in parallel to 5 drives, with 4 stripes being
  updated with RMW. But with 256KiB strips it will partially
  update 5 drives, because the stripe is 1024+256KiB, and it
  needs to do RMW, and four successive 64KiB drives will need to
  do that too, even if only one drive is updated. Usually for
  RAID5 there is an optimization that means that only the
  specific target drive and the parity drives(s) need RMW, but
  it is still very expensive.

This is the "storage for beginners" version, what happens in
practice however depends a lot on specific workload profile
(typical read/write size and latencies and rates), caching and
queueing algorithms in both Linux and the HA firmware.
++++++
UNQTE


I've also found another explanation of the same problem with the right chunk size and how it works here
http://holyhandgrenade.org/blog/2011/08/disk-performance-part-2-raid-layouts-and-stripe-sizing/#more-1212



So in my understanding, when working with compressed data, your compressed data will vary between 128kb (urandom) and 32kb (zeroes) that will be passed to the FS to take care of.

and in our setup of large chunk sizes, if we need to write 32kb-128kb of compressed data, the RAID5 would need to perform  3 read operations and 2 write operations.

As updating a parity chunk requires either
- The original chunk, the new chunk, and the old parity block
- Or, all chunks (except for the parity chunk) in the stripe

disk        disk1   disk2   disk3   disk4
chunk size  512kb   512kb   512kb   512kbP

So in worst case scenario, in order to write 32kb, RAID5 would need to read (480 + 512 + P512) then write (32 + P512)

That's my current understanding of the situation.
I was planning to write an update to my story later on, once I hopefully solve the problem. But an intermidiary update is that I have performed full defrag with full compression (2 days). Then balance of the all data (10 days)and it didn't help the performance .

So now I am moving the data from the array and will be rebuilding it with 64 or 32 chunk size and checking the performance.

VG,
kos



----- Original Message -----
From: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>
To: "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com>
Cc: "Marat Khalili" <mkh@rqc.ru>, linux-btrfs@vger.kernel.org
Sent: Wednesday, 16 August, 2017 11:26:38 AM
Subject: Re: slow btrfs with a single kworker process using 100% CPU

Am 16.08.2017 um 11:02 schrieb Konstantin V. Gavrilenko:
> Could be similar issue as what I had recently, with the RAID5 and 256kb chunk size.
> please provide more information about your RAID setup.

Hope this helps:

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath]
[raid0] [raid10]
md0 : active raid5 sdd1[1] sdf1[4] sdc1[0] sde1[2]
      11717406720 blocks super 1.2 level 5, 512k chunk, algorithm 2
[4/4] [UUUU]
      bitmap: 6/30 pages [24KB], 65536KB chunk

md2 : active raid5 sdm1[2] sdl1[1] sdk1[0] sdn1[4]
      11717406720 blocks super 1.2 level 5, 512k chunk, algorithm 2
[4/4] [UUUU]
      bitmap: 7/30 pages [28KB], 65536KB chunk

md1 : active raid5 sdi1[2] sdg1[0] sdj1[4] sdh1[1]
      11717406720 blocks super 1.2 level 5, 512k chunk, algorithm 2
[4/4] [UUUU]
      bitmap: 7/30 pages [28KB], 65536KB chunk

md3 : active raid5 sdp1[1] sdo1[0] sdq1[2] sdr1[4]
      11717406720 blocks super 1.2 level 5, 512k chunk, algorithm 2
[4/4] [UUUU]
      bitmap: 6/30 pages [24KB], 65536KB chunk

# btrfs fi usage /vmbackup/
Overall:
    Device size:                  43.65TiB
    Device allocated:             31.98TiB
    Device unallocated:           11.67TiB
    Device missing:                  0.00B
    Used:                         30.80TiB
    Free (estimated):             12.84TiB      (min: 12.84TiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID0: Size:31.83TiB, Used:30.66TiB
   /dev/md0        7.96TiB
   /dev/md1        7.96TiB
   /dev/md2        7.96TiB
   /dev/md3        7.96TiB

Metadata,RAID0: Size:153.00GiB, Used:141.34GiB
   /dev/md0       38.25GiB
   /dev/md1       38.25GiB
   /dev/md2       38.25GiB
   /dev/md3       38.25GiB

System,RAID0: Size:128.00MiB, Used:2.28MiB
   /dev/md0       32.00MiB
   /dev/md1       32.00MiB
   /dev/md2       32.00MiB
   /dev/md3       32.00MiB

Unallocated:
   /dev/md0        2.92TiB
   /dev/md1        2.92TiB
   /dev/md2        2.92TiB
   /dev/md3        2.92TiB


Stefan

> 
> p.s.
> you can also check the tread "Btrfs + compression = slow performance and high cpu usage"
> 
> ----- Original Message -----
> From: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>
> To: "Marat Khalili" <mkh@rqc.ru>, linux-btrfs@vger.kernel.org
> Sent: Wednesday, 16 August, 2017 10:37:43 AM
> Subject: Re: slow btrfs with a single kworker process using 100% CPU
> 
> Am 16.08.2017 um 08:53 schrieb Marat Khalili:
>>> I've one system where a single kworker process is using 100% CPU
>>> sometimes a second process comes up with 100% CPU [btrfs-transacti]. Is
>>> there anything i can do to get the old speed again or find the culprit?
>>
>> 1. Do you use quotas (qgroups)?
> 
> No qgroups and no quota.
> 
>> 2. Do you have a lot of snapshots? Have you deleted some recently?
> 
> 1413 Snapshots. I'm deleting 50 of them every night. But btrfs-cleaner
> process isn't running / consuming CPU currently.
> 
>> More info about your system would help too.
> Kernel is OpenSuSE Leap 42.3.
> 
> btrfs is mounted with
> compress-force=zlib
> 
> btrfs is running as a raid0 on top of 4 md raid 5 devices.
> 
> Greets,
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-16 11:48         ` Konstantin V. Gavrilenko
@ 2017-08-16 12:00           ` Roman Mamedov
  2017-08-16 12:29             ` Konstantin V. Gavrilenko
  0 siblings, 1 reply; 17+ messages in thread
From: Roman Mamedov @ 2017-08-16 12:00 UTC (permalink / raw)
  To: Konstantin V. Gavrilenko
  Cc: Stefan Priebe - Profihost AG, Marat Khalili, linux-btrfs, Peter Grandi

On Wed, 16 Aug 2017 12:48:42 +0100 (BST)
"Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com> wrote:

> I believe the chunk size of 512kb is even worth for performance then the default settings on my HW RAID of  256kb.

It might be, but that does not explain the original problem reported at all.
If mdraid performance would be the bottleneck, you would see high iowait,
possibly some CPU load from the mdX_raidY threads. But not a single Btrfs
thread pegging into 100% CPU.

> So now I am moving the data from the array and will be rebuilding it with 64
> or 32 chunk size and checking the performance.

64K is the sweet spot for RAID5/6:
http://louwrentius.com/linux-raid-level-and-chunk-size-the-benchmarks.html

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-16 12:00           ` Roman Mamedov
@ 2017-08-16 12:29             ` Konstantin V. Gavrilenko
  2017-08-16 12:38               ` Stefan Priebe - Profihost AG
                                 ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Konstantin V. Gavrilenko @ 2017-08-16 12:29 UTC (permalink / raw)
  To: Roman Mamedov
  Cc: Stefan Priebe - Profihost AG, Marat Khalili, linux-btrfs, Peter Grandi

Roman, initially I had a single process occupying 100% CPU, when sysrq it was indicating as "btrfs_find_space_for_alloc"
but that's when I used the autodefrag, compress, forcecompress and commit=10 mount flags and space_cache was v1 by default.
when I switched to "relatime,compress-force=zlib,space_cache=v2" the 100% cpu has dissapeared, but the shite performance remained.


As to the chunk size, there is no information in the article about the type of data that was used. While in our case we are pretty certain about the compressed block size (32-128). I am currently inclining towards 32k as it might be ideal in a situation when we have a 5 disk raid5 array.

In theory
1. The minimum compressed write (32k) would fill the chunk on a single disk, thus the IO cost of the operation would be 2 reads (original chunk + original parity)  and 2 writes (new chunk + new parity)

2. The maximum compressed write (128k) would require the update of 1 chunk on each of the 4 data disks + 1 parity  write 



Stefan what mount flags do you use?

kos



----- Original Message -----
From: "Roman Mamedov" <rm@romanrm.net>
To: "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com>
Cc: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>, "Marat Khalili" <mkh@rqc.ru>, linux-btrfs@vger.kernel.org, "Peter Grandi" <pg@btrfs.list.sabi.co.uk>
Sent: Wednesday, 16 August, 2017 2:00:03 PM
Subject: Re: slow btrfs with a single kworker process using 100% CPU

On Wed, 16 Aug 2017 12:48:42 +0100 (BST)
"Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com> wrote:

> I believe the chunk size of 512kb is even worth for performance then the default settings on my HW RAID of  256kb.

It might be, but that does not explain the original problem reported at all.
If mdraid performance would be the bottleneck, you would see high iowait,
possibly some CPU load from the mdX_raidY threads. But not a single Btrfs
thread pegging into 100% CPU.

> So now I am moving the data from the array and will be rebuilding it with 64
> or 32 chunk size and checking the performance.

64K is the sweet spot for RAID5/6:
http://louwrentius.com/linux-raid-level-and-chunk-size-the-benchmarks.html

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-16 12:29             ` Konstantin V. Gavrilenko
@ 2017-08-16 12:38               ` Stefan Priebe - Profihost AG
  2017-08-16 12:46               ` Stefan Priebe - Profihost AG
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-08-16 12:38 UTC (permalink / raw)
  To: Konstantin V. Gavrilenko, Roman Mamedov
  Cc: Marat Khalili, linux-btrfs, Peter Grandi


Am 16.08.2017 um 14:29 schrieb Konstantin V. Gavrilenko:
> Roman, initially I had a single process occupying 100% CPU, when sysrq it was indicating as "btrfs_find_space_for_alloc"
> but that's when I used the autodefrag, compress, forcecompress and commit=10 mount flags and space_cache was v1 by default.
> when I switched to "relatime,compress-force=zlib,space_cache=v2" the 100% cpu has dissapeared, but the shite performance remained.
> 
> 
> As to the chunk size, there is no information in the article about the type of data that was used. While in our case we are pretty certain about the compressed block size (32-128). I am currently inclining towards 32k as it might be ideal in a situation when we have a 5 disk raid5 array.
> 
> In theory
> 1. The minimum compressed write (32k) would fill the chunk on a single disk, thus the IO cost of the operation would be 2 reads (original chunk + original parity)  and 2 writes (new chunk + new parity)
> 
> 2. The maximum compressed write (128k) would require the update of 1 chunk on each of the 4 data disks + 1 parity  write 
> 
> 
> 
> Stefan what mount flags do you use?

noatime,compress-force=zlib,noacl,space_cache,skip_balance,subvolid=5,subvol=/

Greets,
Stefan


> kos
> 
> 
> 
> ----- Original Message -----
> From: "Roman Mamedov" <rm@romanrm.net>
> To: "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com>
> Cc: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>, "Marat Khalili" <mkh@rqc.ru>, linux-btrfs@vger.kernel.org, "Peter Grandi" <pg@btrfs.list.sabi.co.uk>
> Sent: Wednesday, 16 August, 2017 2:00:03 PM
> Subject: Re: slow btrfs with a single kworker process using 100% CPU
> 
> On Wed, 16 Aug 2017 12:48:42 +0100 (BST)
> "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com> wrote:
> 
>> I believe the chunk size of 512kb is even worth for performance then the default settings on my HW RAID of  256kb.
> 
> It might be, but that does not explain the original problem reported at all.
> If mdraid performance would be the bottleneck, you would see high iowait,
> possibly some CPU load from the mdX_raidY threads. But not a single Btrfs
> thread pegging into 100% CPU.
> 
>> So now I am moving the data from the array and will be rebuilding it with 64
>> or 32 chunk size and checking the performance.
> 
> 64K is the sweet spot for RAID5/6:
> http://louwrentius.com/linux-raid-level-and-chunk-size-the-benchmarks.html
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-16 12:29             ` Konstantin V. Gavrilenko
  2017-08-16 12:38               ` Stefan Priebe - Profihost AG
@ 2017-08-16 12:46               ` Stefan Priebe - Profihost AG
  2017-08-16 13:04               ` Stefan Priebe - Profihost AG
  2017-08-17  5:47               ` Stefan Priebe - Profihost AG
  3 siblings, 0 replies; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-08-16 12:46 UTC (permalink / raw)
  To: Konstantin V. Gavrilenko, Roman Mamedov
  Cc: Marat Khalili, linux-btrfs, Peter Grandi


Am 16.08.2017 um 14:29 schrieb Konstantin V. Gavrilenko:
> Roman, initially I had a single process occupying 100% CPU, when sysrq it was indicating as "btrfs_find_space_for_alloc"
> but that's when I used the autodefrag, compress, forcecompress and commit=10 mount flags and space_cache was v1 by default.
> when I switched to "relatime,compress-force=zlib,space_cache=v2" the 100% cpu has dissapeared, but the shite performance remained.
> 
> 
> As to the chunk size, there is no information in the article about the type of data that was used. While in our case we are pretty certain about the compressed block size (32-128). I am currently inclining towards 32k as it might be ideal in a situation when we have a 5 disk raid5 array.
> 
> In theory
> 1. The minimum compressed write (32k) would fill the chunk on a single disk, thus the IO cost of the operation would be 2 reads (original chunk + original parity)  and 2 writes (new chunk + new parity)
> 
> 2. The maximum compressed write (128k) would require the update of 1 chunk on each of the 4 data disks + 1 parity  write 
> 
> 
> 
> Stefan what mount flags do you use?

noatime,compress-force=zlib,noacl,space_cache,skip_balance,subvolid=5,subvol=/

Greets,
Stefan


> kos
> 
> 
> 
> ----- Original Message -----
> From: "Roman Mamedov" <rm@romanrm.net>
> To: "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com>
> Cc: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>, "Marat Khalili" <mkh@rqc.ru>, linux-btrfs@vger.kernel.org, "Peter Grandi" <pg@btrfs.list.sabi.co.uk>
> Sent: Wednesday, 16 August, 2017 2:00:03 PM
> Subject: Re: slow btrfs with a single kworker process using 100% CPU
> 
> On Wed, 16 Aug 2017 12:48:42 +0100 (BST)
> "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com> wrote:
> 
>> I believe the chunk size of 512kb is even worth for performance then the default settings on my HW RAID of  256kb.
> 
> It might be, but that does not explain the original problem reported at all.
> If mdraid performance would be the bottleneck, you would see high iowait,
> possibly some CPU load from the mdX_raidY threads. But not a single Btrfs
> thread pegging into 100% CPU.
> 
>> So now I am moving the data from the array and will be rebuilding it with 64
>> or 32 chunk size and checking the performance.
> 
> 64K is the sweet spot for RAID5/6:
> http://louwrentius.com/linux-raid-level-and-chunk-size-the-benchmarks.html
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-16 12:29             ` Konstantin V. Gavrilenko
  2017-08-16 12:38               ` Stefan Priebe - Profihost AG
  2017-08-16 12:46               ` Stefan Priebe - Profihost AG
@ 2017-08-16 13:04               ` Stefan Priebe - Profihost AG
  2017-08-17  5:47               ` Stefan Priebe - Profihost AG
  3 siblings, 0 replies; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-08-16 13:04 UTC (permalink / raw)
  To: Konstantin V. Gavrilenko, Roman Mamedov
  Cc: Marat Khalili, linux-btrfs, Peter Grandi


Am 16.08.2017 um 14:29 schrieb Konstantin V. Gavrilenko:
> Roman, initially I had a single process occupying 100% CPU, when sysrq it was indicating as "btrfs_find_space_for_alloc"
> but that's when I used the autodefrag, compress, forcecompress and commit=10 mount flags and space_cache was v1 by default.
> when I switched to "relatime,compress-force=zlib,space_cache=v2" the 100% cpu has dissapeared, but the shite performance remained.

space_cache=v2 is not supported by the opensuse kernel - but as i
compile the kernel myself anyway. Is there a patchset to add support for
space_cache=v2?

Greets,
Stefan

> 
> As to the chunk size, there is no information in the article about the type of data that was used. While in our case we are pretty certain about the compressed block size (32-128). I am currently inclining towards 32k as it might be ideal in a situation when we have a 5 disk raid5 array.
> 
> In theory
> 1. The minimum compressed write (32k) would fill the chunk on a single disk, thus the IO cost of the operation would be 2 reads (original chunk + original parity)  and 2 writes (new chunk + new parity)
> 
> 2. The maximum compressed write (128k) would require the update of 1 chunk on each of the 4 data disks + 1 parity  write 
> 
> 
> 
> Stefan what mount flags do you use?
> 
> kos
> 
> 
> 
> ----- Original Message -----
> From: "Roman Mamedov" <rm@romanrm.net>
> To: "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com>
> Cc: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>, "Marat Khalili" <mkh@rqc.ru>, linux-btrfs@vger.kernel.org, "Peter Grandi" <pg@btrfs.list.sabi.co.uk>
> Sent: Wednesday, 16 August, 2017 2:00:03 PM
> Subject: Re: slow btrfs with a single kworker process using 100% CPU
> 
> On Wed, 16 Aug 2017 12:48:42 +0100 (BST)
> "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com> wrote:
> 
>> I believe the chunk size of 512kb is even worth for performance then the default settings on my HW RAID of  256kb.
> 
> It might be, but that does not explain the original problem reported at all.
> If mdraid performance would be the bottleneck, you would see high iowait,
> possibly some CPU load from the mdX_raidY threads. But not a single Btrfs
> thread pegging into 100% CPU.
> 
>> So now I am moving the data from the array and will be rebuilding it with 64
>> or 32 chunk size and checking the performance.
> 
> 64K is the sweet spot for RAID5/6:
> http://louwrentius.com/linux-raid-level-and-chunk-size-the-benchmarks.html
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-16  9:02     ` Konstantin V. Gavrilenko
  2017-08-16  9:26       ` Stefan Priebe - Profihost AG
@ 2017-08-16 23:21       ` Peter Grandi
  1 sibling, 0 replies; 17+ messages in thread
From: Peter Grandi @ 2017-08-16 23:21 UTC (permalink / raw)
  To: Linux fs Btrfs

>>> I've one system where a single kworker process is using 100%
>>> CPU sometimes a second process comes up with 100% CPU
>>> [btrfs-transacti]. [ ... ]

>> [ ... ]1413 Snapshots. I'm deleting 50 of them every night. But
>> btrfs-cleaner process isn't running / consuming CPU currently.

Reminder that:

https://btrfs.wiki.kernel.org/index.php/Gotchas#Having_many_subvolumes_can_be_very_slow

"The cost of several operations, including currently balance, device
delete and fs resize, is proportional to the number of subvolumes,
including snapshots, and (slightly super-linearly) the number of
extents in the subvolumes."

>> [ ... ] btrfs is mounted with compress-force=zlib

> Could be similar issue as what I had recently, with the RAID5 and
> 256kb chunk size. please provide more information about your RAID
> setup.

It is similar, but updating in-place compressed files can create
this situation even without RAID5 RMW:

https://btrfs.wiki.kernel.org/index.php/Gotchas#Fragmentation

"Files with a lot of random writes can become heavily fragmented
(10000+ extents) causing thrashing on HDDs and excessive multi-second
spikes of CPU load on systems with an SSD or large amount a RAM. ...
Symptoms include btrfs-transacti and btrfs-endio-wri taking up a lot
of CPU time (in spikes, possibly triggered by syncs)."

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-16 12:29             ` Konstantin V. Gavrilenko
                                 ` (2 preceding siblings ...)
  2017-08-16 13:04               ` Stefan Priebe - Profihost AG
@ 2017-08-17  5:47               ` Stefan Priebe - Profihost AG
  2017-08-17  7:43                 ` Stefan Priebe - Profihost AG
  3 siblings, 1 reply; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-08-17  5:47 UTC (permalink / raw)
  To: Konstantin V. Gavrilenko, Roman Mamedov
  Cc: Marat Khalili, linux-btrfs, Peter Grandi

i've backported the free space cache tree to my kerne and hopefully any
fixes related to it.

The first mount with clear_cache,space_cache=v2 took around 5 hours.

Currently i do not see any kworker with 100CPU but i don't see much load
at all.

btrfs-transaction tooks around 2-4% CPU together with a kworker process
and some 2-3% mdadm processes. I/O Wait is at 3%.

That's it. It does not do much more. Writing a file does not work.

Greets,
Stefan

Am 16.08.2017 um 14:29 schrieb Konstantin V. Gavrilenko:
> Roman, initially I had a single process occupying 100% CPU, when sysrq it was indicating as "btrfs_find_space_for_alloc"
> but that's when I used the autodefrag, compress, forcecompress and commit=10 mount flags and space_cache was v1 by default.
> when I switched to "relatime,compress-force=zlib,space_cache=v2" the 100% cpu has dissapeared, but the shite performance remained.
> 
> 
> As to the chunk size, there is no information in the article about the type of data that was used. While in our case we are pretty certain about the compressed block size (32-128). I am currently inclining towards 32k as it might be ideal in a situation when we have a 5 disk raid5 array.
> 
> In theory
> 1. The minimum compressed write (32k) would fill the chunk on a single disk, thus the IO cost of the operation would be 2 reads (original chunk + original parity)  and 2 writes (new chunk + new parity)
> 
> 2. The maximum compressed write (128k) would require the update of 1 chunk on each of the 4 data disks + 1 parity  write 
> 
> 
> 
> Stefan what mount flags do you use?
> 
> kos
> 
> 
> 
> ----- Original Message -----
> From: "Roman Mamedov" <rm@romanrm.net>
> To: "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com>
> Cc: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>, "Marat Khalili" <mkh@rqc.ru>, linux-btrfs@vger.kernel.org, "Peter Grandi" <pg@btrfs.list.sabi.co.uk>
> Sent: Wednesday, 16 August, 2017 2:00:03 PM
> Subject: Re: slow btrfs with a single kworker process using 100% CPU
> 
> On Wed, 16 Aug 2017 12:48:42 +0100 (BST)
> "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com> wrote:
> 
>> I believe the chunk size of 512kb is even worth for performance then the default settings on my HW RAID of  256kb.
> 
> It might be, but that does not explain the original problem reported at all.
> If mdraid performance would be the bottleneck, you would see high iowait,
> possibly some CPU load from the mdX_raidY threads. But not a single Btrfs
> thread pegging into 100% CPU.
> 
>> So now I am moving the data from the array and will be rebuilding it with 64
>> or 32 chunk size and checking the performance.
> 
> 64K is the sweet spot for RAID5/6:
> http://louwrentius.com/linux-raid-level-and-chunk-size-the-benchmarks.html
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-17  5:47               ` Stefan Priebe - Profihost AG
@ 2017-08-17  7:43                 ` Stefan Priebe - Profihost AG
  2017-08-20 11:00                   ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-08-17  7:43 UTC (permalink / raw)
  To: Konstantin V. Gavrilenko, Roman Mamedov
  Cc: Marat Khalili, linux-btrfs, Peter Grandi

while mounting the device the dmesg is full of:
[ 1320.325147]  [<ffffffffa80a37b0>] ? kthread_park+0x60/0x60
[ 1440.330008] INFO: task btrfs-transacti:3701 blocked for more than 120
seconds.
[ 1440.330014]       Not tainted 4.4.82+525-ph #1
[ 1440.330015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1440.330020] btrfs-transacti D ffff88080964fdd8     0  3701      2
0x00080000
[ 1440.330024]  ffff88080964fdd8 ffffffffa8e10500 ffff880859d4cb00
ffff880809650000
[ 1440.330026]  ffff881056069800 ffff88080964fe08 ffff88080a100000
ffff88080a100068
[ 1440.330028]  ffff88080964fdf0 ffffffffa86d2b75 ffff880036c92000
ffff88080964fe58
[ 1440.330028] Call Trace:
[ 1440.330053]  [<ffffffffa86d2b75>] schedule+0x35/0x80
[ 1440.330120]  [<ffffffffc04cfa55>]
btrfs_commit_transaction.part.24+0x245/0xa30 [btrfs]
[ 1440.330159]  [<ffffffffc04d027a>] btrfs_commit_transaction+0x3a/0x70
[btrfs]
[ 1440.330186]  [<ffffffffc04ca3c5>] transaction_kthread+0x1d5/0x240 [btrfs]
[ 1440.330194]  [<ffffffffa80a389b>] kthread+0xeb/0x110
[ 1440.330200]  [<ffffffffa86d750f>] ret_from_fork+0x3f/0x70
[ 1440.333316] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70

[ 1440.333317] Leftover inexact backtrace:

[ 1440.333322]  [<ffffffffa80a37b0>] ? kthread_park+0x60/0x60
[ 1560.335839] INFO: task btrfs-transacti:3701 blocked for more than 120
seconds.
[ 1560.335843]       Not tainted 4.4.82+525-ph #1
[ 1560.335843] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1560.335848] btrfs-transacti D ffff88080964fdd8     0  3701      2
0x00080000
[ 1560.335852]  ffff88080964fdd8 ffffffffa8e10500 ffff880859d4cb00
ffff880809650000
[ 1560.335854]  ffff881056069800 ffff88080964fe08 ffff88080a100000
ffff88080a100068
[ 1560.335856]  ffff88080964fdf0 ffffffffa86d2b75 ffff880036c92000
ffff88080964fe58
[ 1560.335857] Call Trace:
[ 1560.335875]  [<ffffffffa86d2b75>] schedule+0x35/0x80
[ 1560.335953]  [<ffffffffc04cfa55>]
btrfs_commit_transaction.part.24+0x245/0xa30 [btrfs]
[ 1560.335978]  [<ffffffffc04d027a>] btrfs_commit_transaction+0x3a/0x70
[btrfs]
[ 1560.335995]  [<ffffffffc04ca3c5>] transaction_kthread+0x1d5/0x240 [btrfs]
[ 1560.336001]  [<ffffffffa80a389b>] kthread+0xeb/0x110
[ 1560.336006]  [<ffffffffa86d750f>] ret_from_fork+0x3f/0x70
[ 1560.337829] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70

[ 1560.337830] Leftover inexact backtrace:

[ 1560.337833]  [<ffffffffa80a37b0>] ? kthread_park+0x60/0x60
[ 1680.341127] INFO: task btrfs-transacti:3701 blocked for more than 120
seconds.
[ 1680.341130]       Not tainted 4.4.82+525-ph #1
[ 1680.341131] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1680.341134] btrfs-transacti D ffff88080964fdd8     0  3701      2
0x00080000
[ 1680.341137]  ffff88080964fdd8 ffffffffa8e10500 ffff880859d4cb00
ffff880809650000
[ 1680.341138]  ffff881056069800 ffff88080964fe08 ffff88080a100000
ffff88080a100068
[ 1680.341139]  ffff88080964fdf0 ffffffffa86d2b75 ffff880036c92000
ffff88080964fe58
[ 1680.341140] Call Trace:
[ 1680.341155]  [<ffffffffa86d2b75>] schedule+0x35/0x80
[ 1680.341211]  [<ffffffffc04cfa55>]
btrfs_commit_transaction.part.24+0x245/0xa30 [btrfs]
[ 1680.341237]  [<ffffffffc04d027a>] btrfs_commit_transaction+0x3a/0x70
[btrfs]
[ 1680.341252]  [<ffffffffc04ca3c5>] transaction_kthread+0x1d5/0x240 [btrfs]
[ 1680.341258]  [<ffffffffa80a389b>] kthread+0xeb/0x110
[ 1680.341262]  [<ffffffffa86d750f>] ret_from_fork+0x3f/0x70
[ 1680.343062] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70

Stefan

Am 17.08.2017 um 07:47 schrieb Stefan Priebe - Profihost AG:
> i've backported the free space cache tree to my kerne and hopefully any
> fixes related to it.
> 
> The first mount with clear_cache,space_cache=v2 took around 5 hours.
> 
> Currently i do not see any kworker with 100CPU but i don't see much load
> at all.
> 
> btrfs-transaction tooks around 2-4% CPU together with a kworker process
> and some 2-3% mdadm processes. I/O Wait is at 3%.
> 
> That's it. It does not do much more. Writing a file does not work.
> 
> Greets,
> Stefan
> 
> Am 16.08.2017 um 14:29 schrieb Konstantin V. Gavrilenko:
>> Roman, initially I had a single process occupying 100% CPU, when sysrq it was indicating as "btrfs_find_space_for_alloc"
>> but that's when I used the autodefrag, compress, forcecompress and commit=10 mount flags and space_cache was v1 by default.
>> when I switched to "relatime,compress-force=zlib,space_cache=v2" the 100% cpu has dissapeared, but the shite performance remained.
>>
>>
>> As to the chunk size, there is no information in the article about the type of data that was used. While in our case we are pretty certain about the compressed block size (32-128). I am currently inclining towards 32k as it might be ideal in a situation when we have a 5 disk raid5 array.
>>
>> In theory
>> 1. The minimum compressed write (32k) would fill the chunk on a single disk, thus the IO cost of the operation would be 2 reads (original chunk + original parity)  and 2 writes (new chunk + new parity)
>>
>> 2. The maximum compressed write (128k) would require the update of 1 chunk on each of the 4 data disks + 1 parity  write 
>>
>>
>>
>> Stefan what mount flags do you use?
>>
>> kos
>>
>>
>>
>> ----- Original Message -----
>> From: "Roman Mamedov" <rm@romanrm.net>
>> To: "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com>
>> Cc: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>, "Marat Khalili" <mkh@rqc.ru>, linux-btrfs@vger.kernel.org, "Peter Grandi" <pg@btrfs.list.sabi.co.uk>
>> Sent: Wednesday, 16 August, 2017 2:00:03 PM
>> Subject: Re: slow btrfs with a single kworker process using 100% CPU
>>
>> On Wed, 16 Aug 2017 12:48:42 +0100 (BST)
>> "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com> wrote:
>>
>>> I believe the chunk size of 512kb is even worth for performance then the default settings on my HW RAID of  256kb.
>>
>> It might be, but that does not explain the original problem reported at all.
>> If mdraid performance would be the bottleneck, you would see high iowait,
>> possibly some CPU load from the mdX_raidY threads. But not a single Btrfs
>> thread pegging into 100% CPU.
>>
>>> So now I am moving the data from the array and will be rebuilding it with 64
>>> or 32 chunk size and checking the performance.
>>
>> 64K is the sweet spot for RAID5/6:
>> http://louwrentius.com/linux-raid-level-and-chunk-size-the-benchmarks.html
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-17  7:43                 ` Stefan Priebe - Profihost AG
@ 2017-08-20 11:00                   ` Stefan Priebe - Profihost AG
  2017-08-20 12:34                     ` Marat Khalili
  2017-08-28 18:09                     ` Stefan Priebe - Profihost AG
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-08-20 11:00 UTC (permalink / raw)
  To: Konstantin V. Gavrilenko, Roman Mamedov
  Cc: Marat Khalili, linux-btrfs, Peter Grandi

Hello,

this still happens with space_cache v2. I don't think it is space_cache
related?

Stefan

Am 17.08.2017 um 09:43 schrieb Stefan Priebe - Profihost AG:
> while mounting the device the dmesg is full of:
> [ 1320.325147]  [<ffffffffa80a37b0>] ? kthread_park+0x60/0x60
> [ 1440.330008] INFO: task btrfs-transacti:3701 blocked for more than 120
> seconds.
> [ 1440.330014]       Not tainted 4.4.82+525-ph #1
> [ 1440.330015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1440.330020] btrfs-transacti D ffff88080964fdd8     0  3701      2
> 0x00080000
> [ 1440.330024]  ffff88080964fdd8 ffffffffa8e10500 ffff880859d4cb00
> ffff880809650000
> [ 1440.330026]  ffff881056069800 ffff88080964fe08 ffff88080a100000
> ffff88080a100068
> [ 1440.330028]  ffff88080964fdf0 ffffffffa86d2b75 ffff880036c92000
> ffff88080964fe58
> [ 1440.330028] Call Trace:
> [ 1440.330053]  [<ffffffffa86d2b75>] schedule+0x35/0x80
> [ 1440.330120]  [<ffffffffc04cfa55>]
> btrfs_commit_transaction.part.24+0x245/0xa30 [btrfs]
> [ 1440.330159]  [<ffffffffc04d027a>] btrfs_commit_transaction+0x3a/0x70
> [btrfs]
> [ 1440.330186]  [<ffffffffc04ca3c5>] transaction_kthread+0x1d5/0x240 [btrfs]
> [ 1440.330194]  [<ffffffffa80a389b>] kthread+0xeb/0x110
> [ 1440.330200]  [<ffffffffa86d750f>] ret_from_fork+0x3f/0x70
> [ 1440.333316] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
> 
> [ 1440.333317] Leftover inexact backtrace:
> 
> [ 1440.333322]  [<ffffffffa80a37b0>] ? kthread_park+0x60/0x60
> [ 1560.335839] INFO: task btrfs-transacti:3701 blocked for more than 120
> seconds.
> [ 1560.335843]       Not tainted 4.4.82+525-ph #1
> [ 1560.335843] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1560.335848] btrfs-transacti D ffff88080964fdd8     0  3701      2
> 0x00080000
> [ 1560.335852]  ffff88080964fdd8 ffffffffa8e10500 ffff880859d4cb00
> ffff880809650000
> [ 1560.335854]  ffff881056069800 ffff88080964fe08 ffff88080a100000
> ffff88080a100068
> [ 1560.335856]  ffff88080964fdf0 ffffffffa86d2b75 ffff880036c92000
> ffff88080964fe58
> [ 1560.335857] Call Trace:
> [ 1560.335875]  [<ffffffffa86d2b75>] schedule+0x35/0x80
> [ 1560.335953]  [<ffffffffc04cfa55>]
> btrfs_commit_transaction.part.24+0x245/0xa30 [btrfs]
> [ 1560.335978]  [<ffffffffc04d027a>] btrfs_commit_transaction+0x3a/0x70
> [btrfs]
> [ 1560.335995]  [<ffffffffc04ca3c5>] transaction_kthread+0x1d5/0x240 [btrfs]
> [ 1560.336001]  [<ffffffffa80a389b>] kthread+0xeb/0x110
> [ 1560.336006]  [<ffffffffa86d750f>] ret_from_fork+0x3f/0x70
> [ 1560.337829] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
> 
> [ 1560.337830] Leftover inexact backtrace:
> 
> [ 1560.337833]  [<ffffffffa80a37b0>] ? kthread_park+0x60/0x60
> [ 1680.341127] INFO: task btrfs-transacti:3701 blocked for more than 120
> seconds.
> [ 1680.341130]       Not tainted 4.4.82+525-ph #1
> [ 1680.341131] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1680.341134] btrfs-transacti D ffff88080964fdd8     0  3701      2
> 0x00080000
> [ 1680.341137]  ffff88080964fdd8 ffffffffa8e10500 ffff880859d4cb00
> ffff880809650000
> [ 1680.341138]  ffff881056069800 ffff88080964fe08 ffff88080a100000
> ffff88080a100068
> [ 1680.341139]  ffff88080964fdf0 ffffffffa86d2b75 ffff880036c92000
> ffff88080964fe58
> [ 1680.341140] Call Trace:
> [ 1680.341155]  [<ffffffffa86d2b75>] schedule+0x35/0x80
> [ 1680.341211]  [<ffffffffc04cfa55>]
> btrfs_commit_transaction.part.24+0x245/0xa30 [btrfs]
> [ 1680.341237]  [<ffffffffc04d027a>] btrfs_commit_transaction+0x3a/0x70
> [btrfs]
> [ 1680.341252]  [<ffffffffc04ca3c5>] transaction_kthread+0x1d5/0x240 [btrfs]
> [ 1680.341258]  [<ffffffffa80a389b>] kthread+0xeb/0x110
> [ 1680.341262]  [<ffffffffa86d750f>] ret_from_fork+0x3f/0x70
> [ 1680.343062] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
> 
> Stefan
> 
> Am 17.08.2017 um 07:47 schrieb Stefan Priebe - Profihost AG:
>> i've backported the free space cache tree to my kerne and hopefully any
>> fixes related to it.
>>
>> The first mount with clear_cache,space_cache=v2 took around 5 hours.
>>
>> Currently i do not see any kworker with 100CPU but i don't see much load
>> at all.
>>
>> btrfs-transaction tooks around 2-4% CPU together with a kworker process
>> and some 2-3% mdadm processes. I/O Wait is at 3%.
>>
>> That's it. It does not do much more. Writing a file does not work.
>>
>> Greets,
>> Stefan
>>
>> Am 16.08.2017 um 14:29 schrieb Konstantin V. Gavrilenko:
>>> Roman, initially I had a single process occupying 100% CPU, when sysrq it was indicating as "btrfs_find_space_for_alloc"
>>> but that's when I used the autodefrag, compress, forcecompress and commit=10 mount flags and space_cache was v1 by default.
>>> when I switched to "relatime,compress-force=zlib,space_cache=v2" the 100% cpu has dissapeared, but the shite performance remained.
>>>
>>>
>>> As to the chunk size, there is no information in the article about the type of data that was used. While in our case we are pretty certain about the compressed block size (32-128). I am currently inclining towards 32k as it might be ideal in a situation when we have a 5 disk raid5 array.
>>>
>>> In theory
>>> 1. The minimum compressed write (32k) would fill the chunk on a single disk, thus the IO cost of the operation would be 2 reads (original chunk + original parity)  and 2 writes (new chunk + new parity)
>>>
>>> 2. The maximum compressed write (128k) would require the update of 1 chunk on each of the 4 data disks + 1 parity  write 
>>>
>>>
>>>
>>> Stefan what mount flags do you use?
>>>
>>> kos
>>>
>>>
>>>
>>> ----- Original Message -----
>>> From: "Roman Mamedov" <rm@romanrm.net>
>>> To: "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com>
>>> Cc: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>, "Marat Khalili" <mkh@rqc.ru>, linux-btrfs@vger.kernel.org, "Peter Grandi" <pg@btrfs.list.sabi.co.uk>
>>> Sent: Wednesday, 16 August, 2017 2:00:03 PM
>>> Subject: Re: slow btrfs with a single kworker process using 100% CPU
>>>
>>> On Wed, 16 Aug 2017 12:48:42 +0100 (BST)
>>> "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com> wrote:
>>>
>>>> I believe the chunk size of 512kb is even worth for performance then the default settings on my HW RAID of  256kb.
>>>
>>> It might be, but that does not explain the original problem reported at all.
>>> If mdraid performance would be the bottleneck, you would see high iowait,
>>> possibly some CPU load from the mdX_raidY threads. But not a single Btrfs
>>> thread pegging into 100% CPU.
>>>
>>>> So now I am moving the data from the array and will be rebuilding it with 64
>>>> or 32 chunk size and checking the performance.
>>>
>>> 64K is the sweet spot for RAID5/6:
>>> http://louwrentius.com/linux-raid-level-and-chunk-size-the-benchmarks.html
>>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-20 11:00                   ` Stefan Priebe - Profihost AG
@ 2017-08-20 12:34                     ` Marat Khalili
  2017-08-28 18:09                     ` Stefan Priebe - Profihost AG
  1 sibling, 0 replies; 17+ messages in thread
From: Marat Khalili @ 2017-08-20 12:34 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Konstantin V. Gavrilenko, Roman Mamedov, linux-btrfs, Peter Grandi

I'm a btrfs user, not a developer; developers can probably provide more 
detailed explanation by looking at stack traces in dmesg etc., but it's 
possible that there's just no quick fix (yet). I presume these are 1413 
_full-volume_ snapshots. Then some operations have to process 
43.65TiB*1413=62PiB of data -- well, metadata for that data, but it's 
still a lot as you may guess, especially if it's all heavily fragmented. 
You can either gradually reduce number of snapshots and wait (it may 
drag for weeks and months), or copy everything to a different device and 
reformat this one, then don't create that many snapshots again.

As for "blocked for more than 120 seconds" messages in dmesg, I see them 
every night after I delete about a dozen of snapshots ~10TiB in _total_ 
volume, albeit with qgroups. These messages usually subside after about 
couple of hours. They only tell you what you already know: some btrfs 
operations are painfully slow.

--

With Best Regards,
Marat Khalili


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: slow btrfs with a single kworker process using 100% CPU
  2017-08-20 11:00                   ` Stefan Priebe - Profihost AG
  2017-08-20 12:34                     ` Marat Khalili
@ 2017-08-28 18:09                     ` Stefan Priebe - Profihost AG
  1 sibling, 0 replies; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-08-28 18:09 UTC (permalink / raw)
  To: Konstantin V. Gavrilenko, Roman Mamedov
  Cc: Marat Khalili, linux-btrfs, Peter Grandi

Hello,

a trace of the kworker looks like this:
   kworker/u24:4-13405 [003] .... 344186.202535: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202535: down_read
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202535:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202535: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202535:
btrfs_find_space_for_alloc <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202535: _raw_spin_lock
<-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003] .... 344186.202536:
tree_search_offset.isra.25 <-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003] .... 344186.202554: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202554: up_read <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202554: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202554: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202555: down_read
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202555:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202555: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202555:
btrfs_find_space_for_alloc <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202555: _raw_spin_lock
<-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003] .... 344186.202556:
tree_search_offset.isra.25 <-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003] .... 344186.202560: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202560: up_read <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202561: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202561: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202561: down_read
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202561:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202561: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202562: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202562: up_read <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202562: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202562: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202562: down_read
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202563:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202563: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202563: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202563: up_read <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202563: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202563: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202564: down_read
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202564:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202564: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202564:
btrfs_find_space_for_alloc <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202564: _raw_spin_lock
<-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003] .... 344186.202565:
tree_search_offset.isra.25 <-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003] .... 344186.202566: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202567: up_read <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202567: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202567: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202567: down_read
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202568:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202568: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202568:
btrfs_find_space_for_alloc <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202568: _raw_spin_lock
<-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003] .... 344186.202569:
tree_search_offset.isra.25 <-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003] .... 344186.202576: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202576: up_read <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202577: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202577: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202577: down_read
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202577:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202578: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202578:
btrfs_find_space_for_alloc <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202578: _raw_spin_lock
<-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003] .... 344186.202578:
tree_search_offset.isra.25 <-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003] .... 344186.202584: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202584: up_read <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202584: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202585: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202585: down_read
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202585:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202585: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202586: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202586: up_read <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202586: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202586: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202586: down_read
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202586:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202587: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202587: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202587: up_read <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202587: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202588: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202588: down_read
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202588:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202589: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202589: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202589: up_read <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202589: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202589: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202589: down_read
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202590:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202590: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202590: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202590: up_read <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202590: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202591: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202591: down_read
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202591:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202591: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202591: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202592: up_read <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202592: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202592: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202592: down_read
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202592:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202592: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202593:
btrfs_find_space_for_alloc <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202593: _raw_spin_lock
<-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003] .... 344186.202593:
tree_search_offset.isra.25 <-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003] .... 344186.202597: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202597: up_read <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202597: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202598: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202598: down_read
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202598:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202598: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202598:
btrfs_find_space_for_alloc <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202598: _raw_spin_lock
<-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003] .... 344186.202599:
tree_search_offset.isra.25 <-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003] .... 344186.202623: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202623: up_read <-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202623: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003] .... 344186.202623: _cond_resched
<-find_free_extent

Greets,
Stefan

Am 20.08.2017 um 13:00 schrieb Stefan Priebe - Profihost AG:
> Hello,
> 
> this still happens with space_cache v2. I don't think it is space_cache
> related?
> 
> Stefan
> 
> Am 17.08.2017 um 09:43 schrieb Stefan Priebe - Profihost AG:
>> while mounting the device the dmesg is full of:
>> [ 1320.325147]  [<ffffffffa80a37b0>] ? kthread_park+0x60/0x60
>> [ 1440.330008] INFO: task btrfs-transacti:3701 blocked for more than 120
>> seconds.
>> [ 1440.330014]       Not tainted 4.4.82+525-ph #1
>> [ 1440.330015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 1440.330020] btrfs-transacti D ffff88080964fdd8     0  3701      2
>> 0x00080000
>> [ 1440.330024]  ffff88080964fdd8 ffffffffa8e10500 ffff880859d4cb00
>> ffff880809650000
>> [ 1440.330026]  ffff881056069800 ffff88080964fe08 ffff88080a100000
>> ffff88080a100068
>> [ 1440.330028]  ffff88080964fdf0 ffffffffa86d2b75 ffff880036c92000
>> ffff88080964fe58
>> [ 1440.330028] Call Trace:
>> [ 1440.330053]  [<ffffffffa86d2b75>] schedule+0x35/0x80
>> [ 1440.330120]  [<ffffffffc04cfa55>]
>> btrfs_commit_transaction.part.24+0x245/0xa30 [btrfs]
>> [ 1440.330159]  [<ffffffffc04d027a>] btrfs_commit_transaction+0x3a/0x70
>> [btrfs]
>> [ 1440.330186]  [<ffffffffc04ca3c5>] transaction_kthread+0x1d5/0x240 [btrfs]
>> [ 1440.330194]  [<ffffffffa80a389b>] kthread+0xeb/0x110
>> [ 1440.330200]  [<ffffffffa86d750f>] ret_from_fork+0x3f/0x70
>> [ 1440.333316] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
>>
>> [ 1440.333317] Leftover inexact backtrace:
>>
>> [ 1440.333322]  [<ffffffffa80a37b0>] ? kthread_park+0x60/0x60
>> [ 1560.335839] INFO: task btrfs-transacti:3701 blocked for more than 120
>> seconds.
>> [ 1560.335843]       Not tainted 4.4.82+525-ph #1
>> [ 1560.335843] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 1560.335848] btrfs-transacti D ffff88080964fdd8     0  3701      2
>> 0x00080000
>> [ 1560.335852]  ffff88080964fdd8 ffffffffa8e10500 ffff880859d4cb00
>> ffff880809650000
>> [ 1560.335854]  ffff881056069800 ffff88080964fe08 ffff88080a100000
>> ffff88080a100068
>> [ 1560.335856]  ffff88080964fdf0 ffffffffa86d2b75 ffff880036c92000
>> ffff88080964fe58
>> [ 1560.335857] Call Trace:
>> [ 1560.335875]  [<ffffffffa86d2b75>] schedule+0x35/0x80
>> [ 1560.335953]  [<ffffffffc04cfa55>]
>> btrfs_commit_transaction.part.24+0x245/0xa30 [btrfs]
>> [ 1560.335978]  [<ffffffffc04d027a>] btrfs_commit_transaction+0x3a/0x70
>> [btrfs]
>> [ 1560.335995]  [<ffffffffc04ca3c5>] transaction_kthread+0x1d5/0x240 [btrfs]
>> [ 1560.336001]  [<ffffffffa80a389b>] kthread+0xeb/0x110
>> [ 1560.336006]  [<ffffffffa86d750f>] ret_from_fork+0x3f/0x70
>> [ 1560.337829] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
>>
>> [ 1560.337830] Leftover inexact backtrace:
>>
>> [ 1560.337833]  [<ffffffffa80a37b0>] ? kthread_park+0x60/0x60
>> [ 1680.341127] INFO: task btrfs-transacti:3701 blocked for more than 120
>> seconds.
>> [ 1680.341130]       Not tainted 4.4.82+525-ph #1
>> [ 1680.341131] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 1680.341134] btrfs-transacti D ffff88080964fdd8     0  3701      2
>> 0x00080000
>> [ 1680.341137]  ffff88080964fdd8 ffffffffa8e10500 ffff880859d4cb00
>> ffff880809650000
>> [ 1680.341138]  ffff881056069800 ffff88080964fe08 ffff88080a100000
>> ffff88080a100068
>> [ 1680.341139]  ffff88080964fdf0 ffffffffa86d2b75 ffff880036c92000
>> ffff88080964fe58
>> [ 1680.341140] Call Trace:
>> [ 1680.341155]  [<ffffffffa86d2b75>] schedule+0x35/0x80
>> [ 1680.341211]  [<ffffffffc04cfa55>]
>> btrfs_commit_transaction.part.24+0x245/0xa30 [btrfs]
>> [ 1680.341237]  [<ffffffffc04d027a>] btrfs_commit_transaction+0x3a/0x70
>> [btrfs]
>> [ 1680.341252]  [<ffffffffc04ca3c5>] transaction_kthread+0x1d5/0x240 [btrfs]
>> [ 1680.341258]  [<ffffffffa80a389b>] kthread+0xeb/0x110
>> [ 1680.341262]  [<ffffffffa86d750f>] ret_from_fork+0x3f/0x70
>> [ 1680.343062] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
>>
>> Stefan
>>
>> Am 17.08.2017 um 07:47 schrieb Stefan Priebe - Profihost AG:
>>> i've backported the free space cache tree to my kerne and hopefully any
>>> fixes related to it.
>>>
>>> The first mount with clear_cache,space_cache=v2 took around 5 hours.
>>>
>>> Currently i do not see any kworker with 100CPU but i don't see much load
>>> at all.
>>>
>>> btrfs-transaction tooks around 2-4% CPU together with a kworker process
>>> and some 2-3% mdadm processes. I/O Wait is at 3%.
>>>
>>> That's it. It does not do much more. Writing a file does not work.
>>>
>>> Greets,
>>> Stefan
>>>
>>> Am 16.08.2017 um 14:29 schrieb Konstantin V. Gavrilenko:
>>>> Roman, initially I had a single process occupying 100% CPU, when sysrq it was indicating as "btrfs_find_space_for_alloc"
>>>> but that's when I used the autodefrag, compress, forcecompress and commit=10 mount flags and space_cache was v1 by default.
>>>> when I switched to "relatime,compress-force=zlib,space_cache=v2" the 100% cpu has dissapeared, but the shite performance remained.
>>>>
>>>>
>>>> As to the chunk size, there is no information in the article about the type of data that was used. While in our case we are pretty certain about the compressed block size (32-128). I am currently inclining towards 32k as it might be ideal in a situation when we have a 5 disk raid5 array.
>>>>
>>>> In theory
>>>> 1. The minimum compressed write (32k) would fill the chunk on a single disk, thus the IO cost of the operation would be 2 reads (original chunk + original parity)  and 2 writes (new chunk + new parity)
>>>>
>>>> 2. The maximum compressed write (128k) would require the update of 1 chunk on each of the 4 data disks + 1 parity  write 
>>>>
>>>>
>>>>
>>>> Stefan what mount flags do you use?
>>>>
>>>> kos
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>> From: "Roman Mamedov" <rm@romanrm.net>
>>>> To: "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com>
>>>> Cc: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>, "Marat Khalili" <mkh@rqc.ru>, linux-btrfs@vger.kernel.org, "Peter Grandi" <pg@btrfs.list.sabi.co.uk>
>>>> Sent: Wednesday, 16 August, 2017 2:00:03 PM
>>>> Subject: Re: slow btrfs with a single kworker process using 100% CPU
>>>>
>>>> On Wed, 16 Aug 2017 12:48:42 +0100 (BST)
>>>> "Konstantin V. Gavrilenko" <k.gavrilenko@arhont.com> wrote:
>>>>
>>>>> I believe the chunk size of 512kb is even worth for performance then the default settings on my HW RAID of  256kb.
>>>>
>>>> It might be, but that does not explain the original problem reported at all.
>>>> If mdraid performance would be the bottleneck, you would see high iowait,
>>>> possibly some CPU load from the mdX_raidY threads. But not a single Btrfs
>>>> thread pegging into 100% CPU.
>>>>
>>>>> So now I am moving the data from the array and will be rebuilding it with 64
>>>>> or 32 chunk size and checking the performance.
>>>>
>>>> 64K is the sweet spot for RAID5/6:
>>>> http://louwrentius.com/linux-raid-level-and-chunk-size-the-benchmarks.html
>>>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-08-28 18:10 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-16  6:04 slow btrfs with a single kworker process using 100% CPU Stefan Priebe - Profihost AG
2017-08-16  6:53 ` Marat Khalili
2017-08-16  8:37   ` Stefan Priebe - Profihost AG
2017-08-16  9:02     ` Konstantin V. Gavrilenko
2017-08-16  9:26       ` Stefan Priebe - Profihost AG
2017-08-16 11:48         ` Konstantin V. Gavrilenko
2017-08-16 12:00           ` Roman Mamedov
2017-08-16 12:29             ` Konstantin V. Gavrilenko
2017-08-16 12:38               ` Stefan Priebe - Profihost AG
2017-08-16 12:46               ` Stefan Priebe - Profihost AG
2017-08-16 13:04               ` Stefan Priebe - Profihost AG
2017-08-17  5:47               ` Stefan Priebe - Profihost AG
2017-08-17  7:43                 ` Stefan Priebe - Profihost AG
2017-08-20 11:00                   ` Stefan Priebe - Profihost AG
2017-08-20 12:34                     ` Marat Khalili
2017-08-28 18:09                     ` Stefan Priebe - Profihost AG
2017-08-16 23:21       ` Peter Grandi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.