All of lore.kernel.org
 help / color / mirror / Atom feed
* Enhancement Idea - Optional PGO+LTO build for btrfs-progs
@ 2021-07-14  2:51 DanglingPointer
  2021-07-14  5:00 ` Qu Wenruo
  0 siblings, 1 reply; 7+ messages in thread
From: DanglingPointer @ 2021-07-14  2:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: danglingpointerexception

Recently we have been impacted by some performance issues with the 
workstations in my lab with large multi-terabyte arrays in btrfs.  I 
have detailed this on a separate thread.  It got me thinking however, 
why not have an optional configure option for btrfs-progs to use PGO 
against the entire suite of regression tests?

Idea is:

 1. configure with optional "-pgo" or "-fdo" option which will configure
    a relative path from source root where instrumentation files will go
    (let's start with gcc only for now, so *.gcda files into a folder). 
    We then add the instrumentation compiler option
 2. build btrfs-progs
 3. run every single tests available ( make test &&  make test-fsck &&
    make test-convert)
 4. clean-up except for instrumentation files
 5. re-build without the instrumentation flag from point 1; and use the
    instrumentation files for feedback directed optimisation (FDO) (for
    gcc add additional partial-training flag); add LTO.

I know btrfs is primarily IO bound and not cpu.  But just thinking of 
squeezing every last efficiency out of whatever is running in the cpu, 
cache and memory.

I suppose people can do the above on their own, but was thinking if it 
was provided as a configuration optional option then it would make it 
easier for people to do without more hacking.  Just need to add warnings 
that it will take a long time, have a coffee.

The python3 configure process has the process above as an optional 
option but caters for gcc and clang (might even cater for icc).

Anyways, that's my idea for an enhancement above.

Would like to know your thoughts.  cheers...


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enhancement Idea - Optional PGO+LTO build for btrfs-progs
  2021-07-14  2:51 Enhancement Idea - Optional PGO+LTO build for btrfs-progs DanglingPointer
@ 2021-07-14  5:00 ` Qu Wenruo
  2021-07-14  7:34   ` DanglingPointer
  0 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2021-07-14  5:00 UTC (permalink / raw)
  To: DanglingPointer, linux-btrfs



On 2021/7/14 上午10:51, DanglingPointer wrote:
> Recently we have been impacted by some performance issues with the
> workstations in my lab with large multi-terabyte arrays in btrfs.  I
> have detailed this on a separate thread.  It got me thinking however,
> why not have an optional configure option for btrfs-progs to use PGO
> against the entire suite of regression tests?
>
> Idea is:
>
> 1. configure with optional "-pgo" or "-fdo" option which will configure
>     a relative path from source root where instrumentation files will go
>     (let's start with gcc only for now, so *.gcda files into a folder).
>     We then add the instrumentation compiler option
> 2. build btrfs-progs
> 3. run every single tests available ( make test &&  make test-fsck &&
>     make test-convert)
> 4. clean-up except for instrumentation files
> 5. re-build without the instrumentation flag from point 1; and use the
>     instrumentation files for feedback directed optimisation (FDO) (for
>     gcc add additional partial-training flag); add LTO.

Why would you think btrfs-progs is the one needs to optimization?

 From your original report, there is nothing btrfs-progs related at all.

All your work, from scrub to IO, it's all handled by kernel btrfs module.

Thus optimization of btrfs-progs would bring no impact.

Thanks,
Qu
>
> I know btrfs is primarily IO bound and not cpu.  But just thinking of
> squeezing every last efficiency out of whatever is running in the cpu,
> cache and memory.
>
> I suppose people can do the above on their own, but was thinking if it
> was provided as a configuration optional option then it would make it
> easier for people to do without more hacking.  Just need to add warnings
> that it will take a long time, have a coffee.
>
> The python3 configure process has the process above as an optional
> option but caters for gcc and clang (might even cater for icc).
>
> Anyways, that's my idea for an enhancement above.
>
> Would like to know your thoughts.  cheers...
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enhancement Idea - Optional PGO+LTO build for btrfs-progs
  2021-07-14  5:00 ` Qu Wenruo
@ 2021-07-14  7:34   ` DanglingPointer
  2021-07-14  7:57     ` Qu Wenruo
  0 siblings, 1 reply; 7+ messages in thread
From: DanglingPointer @ 2021-07-14  7:34 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: danglingpointerexception

"Why would you think btrfs-progs is the one needs to optimization?"

Perhaps I should have written more context.  When the data migration was 
taking a very long time (days); and the pauses due to "btrfs-transacti" 
blocking all IO including nfsd.  We thought, "should we '$ btrfs scrub 
<mount>' to make sure nothing had gone wrong?"

Problem is, scrubbing on the whole RAID5 takes ages!  If we did one disk 
of the array only it would at least sample a quarter of the array with a 
quarter chance of detecting if something/anything had gone wrong and 
hopefully won't massively slow down the on-going migration.

We tried it for a while on the single drive and it did indeed have 2x 
the scrubbing throughput but it was still very slow since we're talking 
multi-terrabytes on the single disk!  I believe the ETA forecast was ~3 
days.

Interestingly scrubbing the whole lot (whole RAID5 array) in one go by 
just scrubbing the mount point is a 4day ETA which we do regularly every 
3 months.  So even though it is slower on each disk, it finishes the 
whole lot faster than doing one disk at a time sequentially.

Anyways, thanks for informing us on what btrfs-progs does and how 'scrub 
speed' is independent of btrfs-progs and done by the kernel ioctls (on 
the other email thread).

regards,

DP

I thought btrfs scrub was part of btrfs-progs.  Pardon my ignorance if 
it isn't.


On 14/7/21 3:00 pm, Qu Wenruo wrote:
>
>
> On 2021/7/14 上午10:51, DanglingPointer wrote:
>> Recently we have been impacted by some performance issues with the
>> workstations in my lab with large multi-terabyte arrays in btrfs.  I
>> have detailed this on a separate thread.  It got me thinking however,
>> why not have an optional configure option for btrfs-progs to use PGO
>> against the entire suite of regression tests?
>>
>> Idea is:
>>
>> 1. configure with optional "-pgo" or "-fdo" option which will configure
>>     a relative path from source root where instrumentation files will go
>>     (let's start with gcc only for now, so *.gcda files into a folder).
>>     We then add the instrumentation compiler option
>> 2. build btrfs-progs
>> 3. run every single tests available ( make test &&  make test-fsck &&
>>     make test-convert)
>> 4. clean-up except for instrumentation files
>> 5. re-build without the instrumentation flag from point 1; and use the
>>     instrumentation files for feedback directed optimisation (FDO) (for
>>     gcc add additional partial-training flag); add LTO.
>
> Why would you think btrfs-progs is the one needs to optimization?
>
> From your original report, there is nothing btrfs-progs related at all.
>
> All your work, from scrub to IO, it's all handled by kernel btrfs module.
>
> Thus optimization of btrfs-progs would bring no impact.
>
> Thanks,
> Qu
>>
>> I know btrfs is primarily IO bound and not cpu.  But just thinking of
>> squeezing every last efficiency out of whatever is running in the cpu,
>> cache and memory.
>>
>> I suppose people can do the above on their own, but was thinking if it
>> was provided as a configuration optional option then it would make it
>> easier for people to do without more hacking.  Just need to add warnings
>> that it will take a long time, have a coffee.
>>
>> The python3 configure process has the process above as an optional
>> option but caters for gcc and clang (might even cater for icc).
>>
>> Anyways, that's my idea for an enhancement above.
>>
>> Would like to know your thoughts.  cheers...
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enhancement Idea - Optional PGO+LTO build for btrfs-progs
  2021-07-14  7:34   ` DanglingPointer
@ 2021-07-14  7:57     ` Qu Wenruo
  2021-07-14  9:19       ` DanglingPointer
  0 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2021-07-14  7:57 UTC (permalink / raw)
  To: DanglingPointer, linux-btrfs



On 2021/7/14 下午3:34, DanglingPointer wrote:
> "Why would you think btrfs-progs is the one needs to optimization?"
>
> Perhaps I should have written more context.  When the data migration was
> taking a very long time (days); and the pauses due to "btrfs-transacti"
> blocking all IO including nfsd.  We thought, "should we '$ btrfs scrub
> <mount>' to make sure nothing had gone wrong?"
>
> Problem is, scrubbing on the whole RAID5 takes ages!

First thing first, if your system may hit unexpected power loss, or disk
corruption, it's highly recommended not to use btrfs RAID5.

(It's OK if you build btrfs upon soft/hard RAID5).

Btrfs raid5 has its write-hole problem, meaning each power loss/disk
corruption will slightly degrade the robustness of RAID5.

Without enough such small degradation, some corruption will no longer be
repairable.
Scrub is the way to rebuild the robustness, so great you're already
doing that, but I still won't recommend btrfs RAID5 for now, just as the
btrfs(5) man page shows.



Another reason why btrfs RAID5 scrub takes so long is, how we do full fs
scrub.

We initiate scrub for *each* device.

That means, if we have 4 disks, we will scrub all 4 disks separately.

For each device scrub, we need to read all the 4 stripes to also make
sure the parity stripe is also fine.

But in theory, we only need to read such RAID5 stripe once, then all
devices are covered.

So you see we waste quite some disk IO to do some duplicated check.

That's also another reason we recommend RAID1/RAID10.
During scrubbing of each device, we really only need to read from that
device, and only when its data is corrupted, we will try to read the
other copy.

This has no extra IO wasted.

Thanks,
Qu

>  If we did one disk
> of the array only it would at least sample a quarter of the array with a
> quarter chance of detecting if something/anything had gone wrong and
> hopefully won't massively slow down the on-going migration.
>
> We tried it for a while on the single drive and it did indeed have 2x
> the scrubbing throughput but it was still very slow since we're talking
> multi-terrabytes on the single disk!  I believe the ETA forecast was ~3
> days.
>
> Interestingly scrubbing the whole lot (whole RAID5 array) in one go by
> just scrubbing the mount point is a 4day ETA which we do regularly every
> 3 months.  So even though it is slower on each disk, it finishes the
> whole lot faster than doing one disk at a time sequentially.
>
> Anyways, thanks for informing us on what btrfs-progs does and how 'scrub
> speed' is independent of btrfs-progs and done by the kernel ioctls (on
> the other email thread).
>
> regards,
>
> DP
>
> I thought btrfs scrub was part of btrfs-progs.  Pardon my ignorance if
> it isn't.
>
>
> On 14/7/21 3:00 pm, Qu Wenruo wrote:
>>
>>
>> On 2021/7/14 上午10:51, DanglingPointer wrote:
>>> Recently we have been impacted by some performance issues with the
>>> workstations in my lab with large multi-terabyte arrays in btrfs.  I
>>> have detailed this on a separate thread.  It got me thinking however,
>>> why not have an optional configure option for btrfs-progs to use PGO
>>> against the entire suite of regression tests?
>>>
>>> Idea is:
>>>
>>> 1. configure with optional "-pgo" or "-fdo" option which will configure
>>>     a relative path from source root where instrumentation files will go
>>>     (let's start with gcc only for now, so *.gcda files into a folder).
>>>     We then add the instrumentation compiler option
>>> 2. build btrfs-progs
>>> 3. run every single tests available ( make test &&  make test-fsck &&
>>>     make test-convert)
>>> 4. clean-up except for instrumentation files
>>> 5. re-build without the instrumentation flag from point 1; and use the
>>>     instrumentation files for feedback directed optimisation (FDO) (for
>>>     gcc add additional partial-training flag); add LTO.
>>
>> Why would you think btrfs-progs is the one needs to optimization?
>>
>> From your original report, there is nothing btrfs-progs related at all.
>>
>> All your work, from scrub to IO, it's all handled by kernel btrfs module.
>>
>> Thus optimization of btrfs-progs would bring no impact.
>>
>> Thanks,
>> Qu
>>>
>>> I know btrfs is primarily IO bound and not cpu.  But just thinking of
>>> squeezing every last efficiency out of whatever is running in the cpu,
>>> cache and memory.
>>>
>>> I suppose people can do the above on their own, but was thinking if it
>>> was provided as a configuration optional option then it would make it
>>> easier for people to do without more hacking.  Just need to add warnings
>>> that it will take a long time, have a coffee.
>>>
>>> The python3 configure process has the process above as an optional
>>> option but caters for gcc and clang (might even cater for icc).
>>>
>>> Anyways, that's my idea for an enhancement above.
>>>
>>> Would like to know your thoughts.  cheers...
>>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enhancement Idea - Optional PGO+LTO build for btrfs-progs
  2021-07-14  7:57     ` Qu Wenruo
@ 2021-07-14  9:19       ` DanglingPointer
  2021-07-14 12:35         ` Neal Gompa
  0 siblings, 1 reply; 7+ messages in thread
From: DanglingPointer @ 2021-07-14  9:19 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Yes noted.

We're aware of the write hole risk.  We have battery backup for both 
workstations and automation to shut it down in the event of power-outage.

Also they are lab workstations. Not production.  Data is backed up to 
two locations.

The primary reason for RAID5 (or 6) is economics.  Money goes way 
further with RAID5 compared to other RAIDs (1,/10,etc) for the amount of 
data store-able in an array with the reliability of being able to loose 
a disk.  I'm sure there are thousands of others out there in a similar 
situation to me where economics are tight.

Would be good if at some point RAID56 can be looked on and fixed and 
further optimised so it can be declared stable.  Thousands of people 
would further flock to btrfs, especially small medium enterprises, orgs, 
charities, home users, schools and labs.


On 14/7/21 5:57 pm, Qu Wenruo wrote:
>
>
> On 2021/7/14 下午3:34, DanglingPointer wrote:
>> "Why would you think btrfs-progs is the one needs to optimization?"
>>
>> Perhaps I should have written more context.  When the data migration was
>> taking a very long time (days); and the pauses due to "btrfs-transacti"
>> blocking all IO including nfsd.  We thought, "should we '$ btrfs scrub
>> <mount>' to make sure nothing had gone wrong?"
>>
>> Problem is, scrubbing on the whole RAID5 takes ages!
>
> First thing first, if your system may hit unexpected power loss, or disk
> corruption, it's highly recommended not to use btrfs RAID5.
>
> (It's OK if you build btrfs upon soft/hard RAID5).
>
> Btrfs raid5 has its write-hole problem, meaning each power loss/disk
> corruption will slightly degrade the robustness of RAID5.
>
> Without enough such small degradation, some corruption will no longer be
> repairable.
> Scrub is the way to rebuild the robustness, so great you're already
> doing that, but I still won't recommend btrfs RAID5 for now, just as the
> btrfs(5) man page shows.
>
>
>
> Another reason why btrfs RAID5 scrub takes so long is, how we do full fs
> scrub.
>
> We initiate scrub for *each* device.
>
> That means, if we have 4 disks, we will scrub all 4 disks separately.
>
> For each device scrub, we need to read all the 4 stripes to also make
> sure the parity stripe is also fine.
>
> But in theory, we only need to read such RAID5 stripe once, then all
> devices are covered.
>
> So you see we waste quite some disk IO to do some duplicated check.
>
> That's also another reason we recommend RAID1/RAID10.
> During scrubbing of each device, we really only need to read from that
> device, and only when its data is corrupted, we will try to read the
> other copy.
>
> This has no extra IO wasted.
>
> Thanks,
> Qu
>
>>   If we did one disk
>> of the array only it would at least sample a quarter of the array with a
>> quarter chance of detecting if something/anything had gone wrong and
>> hopefully won't massively slow down the on-going migration.
>>
>> We tried it for a while on the single drive and it did indeed have 2x
>> the scrubbing throughput but it was still very slow since we're talking
>> multi-terrabytes on the single disk!  I believe the ETA forecast was ~3
>> days.
>>
>> Interestingly scrubbing the whole lot (whole RAID5 array) in one go by
>> just scrubbing the mount point is a 4day ETA which we do regularly every
>> 3 months.  So even though it is slower on each disk, it finishes the
>> whole lot faster than doing one disk at a time sequentially.
>>
>> Anyways, thanks for informing us on what btrfs-progs does and how 'scrub
>> speed' is independent of btrfs-progs and done by the kernel ioctls (on
>> the other email thread).
>>
>> regards,
>>
>> DP
>>
>> I thought btrfs scrub was part of btrfs-progs.  Pardon my ignorance if
>> it isn't.
>>
>>
>> On 14/7/21 3:00 pm, Qu Wenruo wrote:
>>>
>>>
>>> On 2021/7/14 上午10:51, DanglingPointer wrote:
>>>> Recently we have been impacted by some performance issues with the
>>>> workstations in my lab with large multi-terabyte arrays in btrfs.  I
>>>> have detailed this on a separate thread.  It got me thinking however,
>>>> why not have an optional configure option for btrfs-progs to use PGO
>>>> against the entire suite of regression tests?
>>>>
>>>> Idea is:
>>>>
>>>> 1. configure with optional "-pgo" or "-fdo" option which will 
>>>> configure
>>>>     a relative path from source root where instrumentation files 
>>>> will go
>>>>     (let's start with gcc only for now, so *.gcda files into a 
>>>> folder).
>>>>     We then add the instrumentation compiler option
>>>> 2. build btrfs-progs
>>>> 3. run every single tests available ( make test && make test-fsck &&
>>>>     make test-convert)
>>>> 4. clean-up except for instrumentation files
>>>> 5. re-build without the instrumentation flag from point 1; and use the
>>>>     instrumentation files for feedback directed optimisation (FDO) 
>>>> (for
>>>>     gcc add additional partial-training flag); add LTO.
>>>
>>> Why would you think btrfs-progs is the one needs to optimization?
>>>
>>> From your original report, there is nothing btrfs-progs related at all.
>>>
>>> All your work, from scrub to IO, it's all handled by kernel btrfs 
>>> module.
>>>
>>> Thus optimization of btrfs-progs would bring no impact.
>>>
>>> Thanks,
>>> Qu
>>>>
>>>> I know btrfs is primarily IO bound and not cpu.  But just thinking of
>>>> squeezing every last efficiency out of whatever is running in the cpu,
>>>> cache and memory.
>>>>
>>>> I suppose people can do the above on their own, but was thinking if it
>>>> was provided as a configuration optional option then it would make it
>>>> easier for people to do without more hacking.  Just need to add 
>>>> warnings
>>>> that it will take a long time, have a coffee.
>>>>
>>>> The python3 configure process has the process above as an optional
>>>> option but caters for gcc and clang (might even cater for icc).
>>>>
>>>> Anyways, that's my idea for an enhancement above.
>>>>
>>>> Would like to know your thoughts.  cheers...
>>>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enhancement Idea - Optional PGO+LTO build for btrfs-progs
  2021-07-14  9:19       ` DanglingPointer
@ 2021-07-14 12:35         ` Neal Gompa
  2021-07-14 13:01           ` Qu Wenruo
  0 siblings, 1 reply; 7+ messages in thread
From: Neal Gompa @ 2021-07-14 12:35 UTC (permalink / raw)
  To: DanglingPointer; +Cc: Qu Wenruo, Btrfs BTRFS, Damien Le Moal

On Wed, Jul 14, 2021 at 5:22 AM DanglingPointer
<danglingpointerexception@gmail.com> wrote:
>
> Yes noted.
>
> We're aware of the write hole risk.  We have battery backup for both
> workstations and automation to shut it down in the event of power-outage.
>
> Also they are lab workstations. Not production.  Data is backed up to
> two locations.
>
> The primary reason for RAID5 (or 6) is economics.  Money goes way
> further with RAID5 compared to other RAIDs (1,/10,etc) for the amount of
> data store-able in an array with the reliability of being able to loose
> a disk.  I'm sure there are thousands of others out there in a similar
> situation to me where economics are tight.
>
> Would be good if at some point RAID56 can be looked on and fixed and
> further optimised so it can be declared stable.  Thousands of people
> would further flock to btrfs, especially small medium enterprises, orgs,
> charities, home users, schools and labs.
>

Btrfs RAID 5/6 code is being worked on[1], so this will be fixed
eventually. I personally look forward to this being resolved as
well...

[1]: https://lore.kernel.org/linux-btrfs/BL0PR04MB65144CAE288491C3FC6B0757E7489@BL0PR04MB6514.namprd04.prod.outlook.com/



-- 
真実はいつも一つ!/ Always, there's only one truth!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enhancement Idea - Optional PGO+LTO build for btrfs-progs
  2021-07-14 12:35         ` Neal Gompa
@ 2021-07-14 13:01           ` Qu Wenruo
  0 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2021-07-14 13:01 UTC (permalink / raw)
  To: Neal Gompa, DanglingPointer; +Cc: Btrfs BTRFS, Damien Le Moal



On 2021/7/14 下午8:35, Neal Gompa wrote:
> On Wed, Jul 14, 2021 at 5:22 AM DanglingPointer
> <danglingpointerexception@gmail.com> wrote:
>>
>> Yes noted.
>>
>> We're aware of the write hole risk.  We have battery backup for both
>> workstations and automation to shut it down in the event of power-outage.
>>
>> Also they are lab workstations. Not production.  Data is backed up to
>> two locations.
>>
>> The primary reason for RAID5 (or 6) is economics.  Money goes way
>> further with RAID5 compared to other RAIDs (1,/10,etc) for the amount of
>> data store-able in an array with the reliability of being able to loose
>> a disk.  I'm sure there are thousands of others out there in a similar
>> situation to me where economics are tight.
>>
>> Would be good if at some point RAID56 can be looked on and fixed and
>> further optimised so it can be declared stable.  Thousands of people
>> would further flock to btrfs, especially small medium enterprises, orgs,
>> charities, home users, schools and labs.
>>
>
> Btrfs RAID 5/6 code is being worked on[1], so this will be fixed
> eventually. I personally look forward to this being resolved as
> well...
>
> [1]: https://lore.kernel.org/linux-btrfs/BL0PR04MB65144CAE288491C3FC6B0757E7489@BL0PR04MB6514.namprd04.prod.outlook.com/
>
 From what I know, to solve the RAID5/6 write hole, the real solution
will be a journal, just like what all other soft raid56 does.

This means, we will have a new RAID5/6 profile other than the current
non-journaled RAID5/6.

Existing users still need to convert to the new profiles, not something
will be fixed magically.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-07-14 13:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-14  2:51 Enhancement Idea - Optional PGO+LTO build for btrfs-progs DanglingPointer
2021-07-14  5:00 ` Qu Wenruo
2021-07-14  7:34   ` DanglingPointer
2021-07-14  7:57     ` Qu Wenruo
2021-07-14  9:19       ` DanglingPointer
2021-07-14 12:35         ` Neal Gompa
2021-07-14 13:01           ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.