All of lore.kernel.org
 help / color / mirror / Atom feed
* feedback on three different scrub methods
@ 2017-11-06 20:17 Chris Murphy
  2017-11-07  0:54 ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Murphy @ 2017-11-06 20:17 UTC (permalink / raw)
  To: Btrfs BTRFS, Qu Wenruo

I just did scrubs on five different volumes using all three scrub
methods. There were no errors found, so the test leaves uncertain what
the error handling differences are between the three scrubs. But at
the least with no errors, there are also no disagreements between the
three methods.

1. Speed. Both normal online scrub and the btrfsck --check-data-csum
methods, I get ~275MB/s transfers per iotop. Whereas with --offline
scrub I'm seeing 60%-67% of that, or ~170MB/s transfers. That's
substantially slower. If it were using less CPU or the system more
responsive, it might be worth that trade off in time. But the CPU %
and system responsiveness for other tasks seemed the same, it just
took longer.

2. Each method reports different statistics in different formats. And
that's fine except I can't make heads or tails out of the information
presented, making it basically useless at least to this user.

[chris@f26s ~]$ sudo /srv/scratch/gitsworking/btrfs-progs/btrfs check
--check-data-csum /dev/mapper/sdd
Checking filesystem on /dev/mapper/sdd
UUID: f5adc913-bbea-4340-8b5f-3411e2cda642
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 601877024768 bytes used, no error found
total csum bytes: 586444884
total tree bytes: 1208827904
total fs tree bytes: 510672896
total extent tree bytes: 53755904
btree space waste bytes: 142065358
file data blocks allocated: 2213492944896
 referenced 1977848909824


[chris@f26s ~]$ sudo btrfs-progs/btrfs scrub start --offline
/dev/mapper/sdd
[sudo] password for chris:
Scrub result:
Tree bytes scrubbed: 2417655808
Tree extents scrubbed: 295124
Data bytes scrubbed: 2398080548864
Data extents scrubbed: 751158
Data bytes without csum: 297271296
Read error: 0
Verify error: 0
Csum error: 0

I'm not finding any way these add up. Seems like --offline's tree
bytes scrubbed + data bytes scrubbed should add up to
--check-data-csum's found bytes - waste bytes? But no, I have no idea
what I'm looking at or how it's useful to the user, but OK.


I used kernel 4.13.10 and btrfs-progs 4.13.3 for the usual 'btrfs
scrub start' and the less common 'btrfs check ----check-data-csum'
methods; and Qu's v4.11.1-89-gf939adf2 with offline_scrub checked out.




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: feedback on three different scrub methods
  2017-11-06 20:17 feedback on three different scrub methods Chris Murphy
@ 2017-11-07  0:54 ` Qu Wenruo
  2017-11-07  8:45   ` Nikolay Borisov
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2017-11-07  0:54 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 3531 bytes --]



On 2017年11月07日 04:17, Chris Murphy wrote:
> I just did scrubs on five different volumes using all three scrub
> methods. There were no errors found, so the test leaves uncertain what
> the error handling differences are between the three scrubs. But at
> the least with no errors, there are also no disagreements between the
> three methods.
> 
> 1. Speed. Both normal online scrub and the btrfsck --check-data-csum
> methods, I get ~275MB/s transfers per iotop. Whereas with --offline
> scrub I'm seeing 60%-67% of that, or ~170MB/s transfers. That's
> substantially slower. If it were using less CPU or the system more
> responsive, it might be worth that trade off in time. But the CPU %
> and system responsiveness for other tasks seemed the same, it just
> took longer.

First, --check-data-csum will only check the first copy if it's good.
So --check-data-csum is doing a little cheating here.

Further more, IIRC current offline scrub branch doesn't have the extent
cache enabled, and since it's completely relying on extent tree, quite a
lot of time is spent on doing similar tree search and causing extra
random IO.

I can be totally wrong, but if offline scrub rebased to latest
btrfs-progs (v4.13.3), it may have some small improvement.

> 
> 2. Each method reports different statistics in different formats. And
> that's fine except I can't make heads or tails out of the information
> presented, making it basically useless at least to this user.

Offline scrub in fact can reuse the whole kernel scrub ioctl structure.
So it can completely report the same content as kernel.

However the problem is, for offline scrub, my primary goal is to report
error as detail as possible, to verify kernel scrub.

So it has different output, but it can be improved.

> 
> [chris@f26s ~]$ sudo /srv/scratch/gitsworking/btrfs-progs/btrfs check
> --check-data-csum /dev/mapper/sdd
> Checking filesystem on /dev/mapper/sdd
> UUID: f5adc913-bbea-4340-8b5f-3411e2cda642
> checking extents
> checking free space cache
> checking fs roots
> checking csums
> checking root refs
> found 601877024768 bytes used, no error found
> total csum bytes: 586444884
> total tree bytes: 1208827904
> total fs tree bytes: 510672896
> total extent tree bytes: 53755904
> btree space waste bytes: 142065358
> file data blocks allocated: 2213492944896
>  referenced 1977848909824
> 
> 
> [chris@f26s ~]$ sudo btrfs-progs/btrfs scrub start --offline
> /dev/mapper/sdd
> [sudo] password for chris:
> Scrub result:
> Tree bytes scrubbed: 2417655808
> Tree extents scrubbed: 295124
> Data bytes scrubbed: 2398080548864
> Data extents scrubbed: 751158
> Data bytes without csum: 297271296
> Read error: 0
> Verify error: 0
> Csum error: 0
> 
> I'm not finding any way these add up. Seems like --offline's tree
> bytes scrubbed + data bytes scrubbed should add up to
> --check-data-csum's found bytes - waste bytes? But no, I have no idea
> what I'm looking at or how it's useful to the user, but OK.

Offline scrub counts all copies for both data and metadata.
So it will be more than btrfs check output.

But to be honest, I didn't review btrfs check numbers, so it may have
something wrong.

Thanks for the feedback,
Qu

> 
> 
> I used kernel 4.13.10 and btrfs-progs 4.13.3 for the usual 'btrfs
> scrub start' and the less common 'btrfs check ----check-data-csum'
> methods; and Qu's v4.11.1-89-gf939adf2 with offline_scrub checked out.
> 
> 
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: feedback on three different scrub methods
  2017-11-07  0:54 ` Qu Wenruo
@ 2017-11-07  8:45   ` Nikolay Borisov
  2017-11-07  8:58     ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Nikolay Borisov @ 2017-11-07  8:45 UTC (permalink / raw)
  To: Qu Wenruo, Chris Murphy, Btrfs BTRFS



On  7.11.2017 02:54, Qu Wenruo wrote:
> 
> 
> On 2017年11月07日 04:17, Chris Murphy wrote:
>> I just did scrubs on five different volumes using all three scrub
>> methods. There were no errors found, so the test leaves uncertain what
>> the error handling differences are between the three scrubs. But at
>> the least with no errors, there are also no disagreements between the
>> three methods.
>>
>> 1. Speed. Both normal online scrub and the btrfsck --check-data-csum
>> methods, I get ~275MB/s transfers per iotop. Whereas with --offline
>> scrub I'm seeing 60%-67% of that, or ~170MB/s transfers. That's
>> substantially slower. If it were using less CPU or the system more
>> responsive, it might be worth that trade off in time. But the CPU %
>> and system responsiveness for other tasks seemed the same, it just
>> took longer.
> 
> First, --check-data-csum will only check the first copy if it's good.
> So --check-data-csum is doing a little cheating here.
> 
> Further more, IIRC current offline scrub branch doesn't have the extent
> cache enabled, and since it's completely relying on extent tree, quite a
> lot of time is spent on doing similar tree search and causing extra
> random IO.
> 
> I can be totally wrong, but if offline scrub rebased to latest
> btrfs-progs (v4.13.3), it may have some small improvement.
> 
>>
>> 2. Each method reports different statistics in different formats. And
>> that's fine except I can't make heads or tails out of the information
>> presented, making it basically useless at least to this user.
> 
> Offline scrub in fact can reuse the whole kernel scrub ioctl structure.
> So it can completely report the same content as kernel.
> 
> However the problem is, for offline scrub, my primary goal is to report
> error as detail as possible, to verify kernel scrub.

(Not an expert on scrub but just thinking out loud). Having a separate
from kernel scrub for validation purposes sounds good. However, does it
make sense to have a switch to alternate between the output format. I.e.
if you want to do validation of kernel scrub aka testing then use the
switch to get your very detailed output and see if it makes sense. OTOH
if you wan to use scrub instead of the kernel for whatever reason
(perhaps this could be the default) then just output the same format as
kernel?


> 
> So it has different output, but it can be improved.
> 
>>
>> [chris@f26s ~]$ sudo /srv/scratch/gitsworking/btrfs-progs/btrfs check
>> --check-data-csum /dev/mapper/sdd
>> Checking filesystem on /dev/mapper/sdd
>> UUID: f5adc913-bbea-4340-8b5f-3411e2cda642
>> checking extents
>> checking free space cache
>> checking fs roots
>> checking csums
>> checking root refs
>> found 601877024768 bytes used, no error found
>> total csum bytes: 586444884
>> total tree bytes: 1208827904
>> total fs tree bytes: 510672896
>> total extent tree bytes: 53755904
>> btree space waste bytes: 142065358
>> file data blocks allocated: 2213492944896
>>  referenced 1977848909824
>>
>>
>> [chris@f26s ~]$ sudo btrfs-progs/btrfs scrub start --offline
>> /dev/mapper/sdd
>> [sudo] password for chris:
>> Scrub result:
>> Tree bytes scrubbed: 2417655808
>> Tree extents scrubbed: 295124
>> Data bytes scrubbed: 2398080548864
>> Data extents scrubbed: 751158
>> Data bytes without csum: 297271296
>> Read error: 0
>> Verify error: 0
>> Csum error: 0
>>
>> I'm not finding any way these add up. Seems like --offline's tree
>> bytes scrubbed + data bytes scrubbed should add up to
>> --check-data-csum's found bytes - waste bytes? But no, I have no idea
>> what I'm looking at or how it's useful to the user, but OK.
> 
> Offline scrub counts all copies for both data and metadata.
> So it will be more than btrfs check output.
> 
> But to be honest, I didn't review btrfs check numbers, so it may have
> something wrong.
> 
> Thanks for the feedback,
> Qu
> 
>>
>>
>> I used kernel 4.13.10 and btrfs-progs 4.13.3 for the usual 'btrfs
>> scrub start' and the less common 'btrfs check ----check-data-csum'
>> methods; and Qu's v4.11.1-89-gf939adf2 with offline_scrub checked out.
>>
>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: feedback on three different scrub methods
  2017-11-07  8:45   ` Nikolay Borisov
@ 2017-11-07  8:58     ` Qu Wenruo
  0 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2017-11-07  8:58 UTC (permalink / raw)
  To: Nikolay Borisov, Chris Murphy, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 4823 bytes --]



On 2017年11月07日 16:45, Nikolay Borisov wrote:
> 
> 
> On  7.11.2017 02:54, Qu Wenruo wrote:
>>
>>
>> On 2017年11月07日 04:17, Chris Murphy wrote:
>>> I just did scrubs on five different volumes using all three scrub
>>> methods. There were no errors found, so the test leaves uncertain what
>>> the error handling differences are between the three scrubs. But at
>>> the least with no errors, there are also no disagreements between the
>>> three methods.
>>>
>>> 1. Speed. Both normal online scrub and the btrfsck --check-data-csum
>>> methods, I get ~275MB/s transfers per iotop. Whereas with --offline
>>> scrub I'm seeing 60%-67% of that, or ~170MB/s transfers. That's
>>> substantially slower. If it were using less CPU or the system more
>>> responsive, it might be worth that trade off in time. But the CPU %
>>> and system responsiveness for other tasks seemed the same, it just
>>> took longer.
>>
>> First, --check-data-csum will only check the first copy if it's good.
>> So --check-data-csum is doing a little cheating here.
>>
>> Further more, IIRC current offline scrub branch doesn't have the extent
>> cache enabled, and since it's completely relying on extent tree, quite a
>> lot of time is spent on doing similar tree search and causing extra
>> random IO.
>>
>> I can be totally wrong, but if offline scrub rebased to latest
>> btrfs-progs (v4.13.3), it may have some small improvement.
>>
>>>
>>> 2. Each method reports different statistics in different formats. And
>>> that's fine except I can't make heads or tails out of the information
>>> presented, making it basically useless at least to this user.
>>
>> Offline scrub in fact can reuse the whole kernel scrub ioctl structure.
>> So it can completely report the same content as kernel.
>>
>> However the problem is, for offline scrub, my primary goal is to report
>> error as detail as possible, to verify kernel scrub.
> 
> (Not an expert on scrub but just thinking out loud). Having a separate
> from kernel scrub for validation purposes sounds good. However, does it
> make sense to have a switch to alternate between the output format. I.e.
> if you want to do validation of kernel scrub aka testing then use the
> switch to get your very detailed output and see if it makes sense. OTOH
> if you wan to use scrub instead of the kernel for whatever reason
> (perhaps this could be the default) then just output the same format as
> kernel?

Not yet.
But should not be hard to implement.

For example, for stdout, output everything just like online scrub.
While use stderr to output the detailed info.

So in that case, everyone should be happy.

Thanks,
Qu

> 
> 
>>
>> So it has different output, but it can be improved.
>>
>>>
>>> [chris@f26s ~]$ sudo /srv/scratch/gitsworking/btrfs-progs/btrfs check
>>> --check-data-csum /dev/mapper/sdd
>>> Checking filesystem on /dev/mapper/sdd
>>> UUID: f5adc913-bbea-4340-8b5f-3411e2cda642
>>> checking extents
>>> checking free space cache
>>> checking fs roots
>>> checking csums
>>> checking root refs
>>> found 601877024768 bytes used, no error found
>>> total csum bytes: 586444884
>>> total tree bytes: 1208827904
>>> total fs tree bytes: 510672896
>>> total extent tree bytes: 53755904
>>> btree space waste bytes: 142065358
>>> file data blocks allocated: 2213492944896
>>>  referenced 1977848909824
>>>
>>>
>>> [chris@f26s ~]$ sudo btrfs-progs/btrfs scrub start --offline
>>> /dev/mapper/sdd
>>> [sudo] password for chris:
>>> Scrub result:
>>> Tree bytes scrubbed: 2417655808
>>> Tree extents scrubbed: 295124
>>> Data bytes scrubbed: 2398080548864
>>> Data extents scrubbed: 751158
>>> Data bytes without csum: 297271296
>>> Read error: 0
>>> Verify error: 0
>>> Csum error: 0
>>>
>>> I'm not finding any way these add up. Seems like --offline's tree
>>> bytes scrubbed + data bytes scrubbed should add up to
>>> --check-data-csum's found bytes - waste bytes? But no, I have no idea
>>> what I'm looking at or how it's useful to the user, but OK.
>>
>> Offline scrub counts all copies for both data and metadata.
>> So it will be more than btrfs check output.
>>
>> But to be honest, I didn't review btrfs check numbers, so it may have
>> something wrong.
>>
>> Thanks for the feedback,
>> Qu
>>
>>>
>>>
>>> I used kernel 4.13.10 and btrfs-progs 4.13.3 for the usual 'btrfs
>>> scrub start' and the less common 'btrfs check ----check-data-csum'
>>> methods; and Qu's v4.11.1-89-gf939adf2 with offline_scrub checked out.
>>>
>>>
>>>
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-11-07  8:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-06 20:17 feedback on three different scrub methods Chris Murphy
2017-11-07  0:54 ` Qu Wenruo
2017-11-07  8:45   ` Nikolay Borisov
2017-11-07  8:58     ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.