* Question about btrfs and XOR offloading
@ 2021-01-03 3:50 Rosen Penev
2021-01-03 6:53 ` Qu Wenruo
2021-01-04 14:44 ` David Sterba
0 siblings, 2 replies; 9+ messages in thread
From: Rosen Penev @ 2021-01-03 3:50 UTC (permalink / raw)
To: linux-btrfs
I've noticed that internally, btrfs' XOR code is CPU only. Does anyone
know if there is a performance advantage to using a hardware
accelerated path? I ask as I use BTRFS on a Marvelll ARM platform with
XOR offload capability.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about btrfs and XOR offloading
2021-01-03 3:50 Question about btrfs and XOR offloading Rosen Penev
@ 2021-01-03 6:53 ` Qu Wenruo
2021-01-03 7:12 ` Rosen Penev
2021-01-04 14:44 ` David Sterba
1 sibling, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2021-01-03 6:53 UTC (permalink / raw)
To: Rosen Penev, linux-btrfs
On 2021/1/3 上午11:50, Rosen Penev wrote:
> I've noticed that internally, btrfs' XOR code is CPU only. Does anyone
> know if there is a performance advantage to using a hardware
> accelerated path? I ask as I use BTRFS on a Marvelll ARM platform with
> XOR offload capability.
>
AFAIK XOR is only utilized by RAID56, while RAID56 is not considered
safe due to write-hole, thus I don't believe whether btrfs supports
hardware XOR would make any difference for now.
Thanks,
Qu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about btrfs and XOR offloading
2021-01-03 6:53 ` Qu Wenruo
@ 2021-01-03 7:12 ` Rosen Penev
0 siblings, 0 replies; 9+ messages in thread
From: Rosen Penev @ 2021-01-03 7:12 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Sat, Jan 2, 2021 at 10:53 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2021/1/3 上午11:50, Rosen Penev wrote:
> > I've noticed that internally, btrfs' XOR code is CPU only. Does anyone
> > know if there is a performance advantage to using a hardware
> > accelerated path? I ask as I use BTRFS on a Marvelll ARM platform with
> > XOR offload capability.
> >
> AFAIK XOR is only utilized by RAID56, while RAID56 is not considered
> safe due to write-hole, thus I don't believe whether btrfs supports
> hardware XOR would make any difference for now.
Right the question is about performance. I don't know if XOR being
async would make any difference.
>
> Thanks,
> Qu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about btrfs and XOR offloading
2021-01-03 3:50 Question about btrfs and XOR offloading Rosen Penev
2021-01-03 6:53 ` Qu Wenruo
@ 2021-01-04 14:44 ` David Sterba
2021-01-04 22:56 ` Rosen Penev
1 sibling, 1 reply; 9+ messages in thread
From: David Sterba @ 2021-01-04 14:44 UTC (permalink / raw)
To: Rosen Penev; +Cc: linux-btrfs
On Sat, Jan 02, 2021 at 07:50:38PM -0800, Rosen Penev wrote:
> I've noticed that internally, btrfs' XOR code is CPU only. Does anyone
> know if there is a performance advantage to using a hardware
> accelerated path? I ask as I use BTRFS on a Marvelll ARM platform with
> XOR offload capability.
Even if it's CPU, it's accelerated and best algorithm is selected at
boot time:
[ 16.357703] raid6: avx2x4 gen() 30635 MB/s
[ 16.425701] raid6: avx2x4 xor() 10727 MB/s
[ 16.493701] raid6: avx2x2 gen() 32995 MB/s
[ 16.561701] raid6: avx2x2 xor() 19596 MB/s
[ 16.629701] raid6: avx2x1 gen() 26349 MB/s
[ 16.697710] raid6: avx2x1 xor() 17794 MB/s
[ 16.765701] raid6: sse2x4 gen() 17354 MB/s
[ 16.833701] raid6: sse2x4 xor() 9653 MB/s
[ 16.901706] raid6: sse2x2 gen() 18495 MB/s
[ 16.969702] raid6: sse2x2 xor() 11562 MB/s
[ 17.037701] raid6: sse2x1 gen() 14440 MB/s
[ 17.105818] raid6: sse2x1 xor() 10387 MB/s
[ 17.108300] raid6: using algorithm avx2x2 gen() 32995 MB/s
[ 17.110703] raid6: .... xor() 19596 MB/s, rmw enabled
[ 17.113587] raid6: using avx2x2 recovery algorithm
[ 17.327666] xor: automatically using best checksumming function avx
The xor/parity calculations are done synchronously, while the offloading
to hw usually requires asynchronous submit/wait mechanism. This brings
some overhead, so it depends. The code in btrfs would need to be adapted
to do the async way, unless it's all somehow hidden under the crypto
API.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about btrfs and XOR offloading
2021-01-04 14:44 ` David Sterba
@ 2021-01-04 22:56 ` Rosen Penev
2021-01-05 15:33 ` David Sterba
0 siblings, 1 reply; 9+ messages in thread
From: Rosen Penev @ 2021-01-04 22:56 UTC (permalink / raw)
To: dsterba, Rosen Penev, linux-btrfs
On Mon, Jan 4, 2021 at 6:46 AM David Sterba <dsterba@suse.cz> wrote:
>
> On Sat, Jan 02, 2021 at 07:50:38PM -0800, Rosen Penev wrote:
> > I've noticed that internally, btrfs' XOR code is CPU only. Does anyone
> > know if there is a performance advantage to using a hardware
> > accelerated path? I ask as I use BTRFS on a Marvelll ARM platform with
> > XOR offload capability.
>
> Even if it's CPU, it's accelerated and best algorithm is selected at
> boot time:
>
> [ 16.357703] raid6: avx2x4 gen() 30635 MB/s
> [ 16.425701] raid6: avx2x4 xor() 10727 MB/s
> [ 16.493701] raid6: avx2x2 gen() 32995 MB/s
> [ 16.561701] raid6: avx2x2 xor() 19596 MB/s
> [ 16.629701] raid6: avx2x1 gen() 26349 MB/s
> [ 16.697710] raid6: avx2x1 xor() 17794 MB/s
> [ 16.765701] raid6: sse2x4 gen() 17354 MB/s
> [ 16.833701] raid6: sse2x4 xor() 9653 MB/s
> [ 16.901706] raid6: sse2x2 gen() 18495 MB/s
> [ 16.969702] raid6: sse2x2 xor() 11562 MB/s
> [ 17.037701] raid6: sse2x1 gen() 14440 MB/s
> [ 17.105818] raid6: sse2x1 xor() 10387 MB/s
> [ 17.108300] raid6: using algorithm avx2x2 gen() 32995 MB/s
> [ 17.110703] raid6: .... xor() 19596 MB/s, rmw enabled
> [ 17.113587] raid6: using avx2x2 recovery algorithm
> [ 17.327666] xor: automatically using best checksumming function avx
Yeah...
[ 0.316064] raid6: neonx8 xor() 1087 MB/s
[ 0.452063] raid6: neonx4 xor() 1372 MB/s
[ 0.588064] raid6: neonx2 xor() 1610 MB/s
[ 0.724061] raid6: neonx1 xor() 1345 MB/s
[ 0.860072] raid6: int32x8 xor() 337 MB/s
[ 0.996092] raid6: int32x4 xor() 373 MB/s
[ 1.132087] raid6: int32x2 xor() 348 MB/s
[ 1.268090] raid6: int32x1 xor() 281 MB/s
[ 1.268093] raid6: .... xor() 1610 MB/s, rmw enabled
Not as fast here.
>
> The xor/parity calculations are done synchronously, while the offloading
> to hw usually requires asynchronous submit/wait mechanism. This brings
> some overhead, so it depends. The code in btrfs would need to be adapted
> to do the async way, unless it's all somehow hidden under the crypto
> API.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about btrfs and XOR offloading
2021-01-04 22:56 ` Rosen Penev
@ 2021-01-05 15:33 ` David Sterba
2021-01-05 22:40 ` Rosen Penev
0 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2021-01-05 15:33 UTC (permalink / raw)
To: Rosen Penev; +Cc: dsterba, linux-btrfs
On Mon, Jan 04, 2021 at 02:56:59PM -0800, Rosen Penev wrote:
> On Mon, Jan 4, 2021 at 6:46 AM David Sterba <dsterba@suse.cz> wrote:
> >
> > On Sat, Jan 02, 2021 at 07:50:38PM -0800, Rosen Penev wrote:
> > > I've noticed that internally, btrfs' XOR code is CPU only. Does anyone
> > > know if there is a performance advantage to using a hardware
> > > accelerated path? I ask as I use BTRFS on a Marvelll ARM platform with
> > > XOR offload capability.
> >
> > Even if it's CPU, it's accelerated and best algorithm is selected at
> > boot time:
> >
> > [ 16.357703] raid6: avx2x4 gen() 30635 MB/s
> > [ 16.425701] raid6: avx2x4 xor() 10727 MB/s
> > [ 16.493701] raid6: avx2x2 gen() 32995 MB/s
> > [ 16.561701] raid6: avx2x2 xor() 19596 MB/s
> > [ 16.629701] raid6: avx2x1 gen() 26349 MB/s
> > [ 16.697710] raid6: avx2x1 xor() 17794 MB/s
> > [ 16.765701] raid6: sse2x4 gen() 17354 MB/s
> > [ 16.833701] raid6: sse2x4 xor() 9653 MB/s
> > [ 16.901706] raid6: sse2x2 gen() 18495 MB/s
> > [ 16.969702] raid6: sse2x2 xor() 11562 MB/s
> > [ 17.037701] raid6: sse2x1 gen() 14440 MB/s
> > [ 17.105818] raid6: sse2x1 xor() 10387 MB/s
> > [ 17.108300] raid6: using algorithm avx2x2 gen() 32995 MB/s
> > [ 17.110703] raid6: .... xor() 19596 MB/s, rmw enabled
> > [ 17.113587] raid6: using avx2x2 recovery algorithm
> > [ 17.327666] xor: automatically using best checksumming function avx
> Yeah...
>
> [ 0.316064] raid6: neonx8 xor() 1087 MB/s
> [ 0.452063] raid6: neonx4 xor() 1372 MB/s
> [ 0.588064] raid6: neonx2 xor() 1610 MB/s
> [ 0.724061] raid6: neonx1 xor() 1345 MB/s
> [ 0.860072] raid6: int32x8 xor() 337 MB/s
> [ 0.996092] raid6: int32x4 xor() 373 MB/s
> [ 1.132087] raid6: int32x2 xor() 348 MB/s
> [ 1.268090] raid6: int32x1 xor() 281 MB/s
> [ 1.268093] raid6: .... xor() 1610 MB/s, rmw enabled
>
> Not as fast here.
What's the raw speed of the hw offload? Measured on large data so that
the overhead is negligible.
It might make sense to add the async support in case the speed is
comparable or better to the CPU, but also to reduce the CPU load.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about btrfs and XOR offloading
2021-01-05 15:33 ` David Sterba
@ 2021-01-05 22:40 ` Rosen Penev
2021-01-06 12:37 ` David Sterba
0 siblings, 1 reply; 9+ messages in thread
From: Rosen Penev @ 2021-01-05 22:40 UTC (permalink / raw)
To: dsterba, Rosen Penev, linux-btrfs
On Tue, Jan 5, 2021 at 7:35 AM David Sterba <dsterba@suse.cz> wrote:
>
> On Mon, Jan 04, 2021 at 02:56:59PM -0800, Rosen Penev wrote:
> > On Mon, Jan 4, 2021 at 6:46 AM David Sterba <dsterba@suse.cz> wrote:
> > >
> > > On Sat, Jan 02, 2021 at 07:50:38PM -0800, Rosen Penev wrote:
> > > > I've noticed that internally, btrfs' XOR code is CPU only. Does anyone
> > > > know if there is a performance advantage to using a hardware
> > > > accelerated path? I ask as I use BTRFS on a Marvelll ARM platform with
> > > > XOR offload capability.
> > >
> > > Even if it's CPU, it's accelerated and best algorithm is selected at
> > > boot time:
> > >
> > > [ 16.357703] raid6: avx2x4 gen() 30635 MB/s
> > > [ 16.425701] raid6: avx2x4 xor() 10727 MB/s
> > > [ 16.493701] raid6: avx2x2 gen() 32995 MB/s
> > > [ 16.561701] raid6: avx2x2 xor() 19596 MB/s
> > > [ 16.629701] raid6: avx2x1 gen() 26349 MB/s
> > > [ 16.697710] raid6: avx2x1 xor() 17794 MB/s
> > > [ 16.765701] raid6: sse2x4 gen() 17354 MB/s
> > > [ 16.833701] raid6: sse2x4 xor() 9653 MB/s
> > > [ 16.901706] raid6: sse2x2 gen() 18495 MB/s
> > > [ 16.969702] raid6: sse2x2 xor() 11562 MB/s
> > > [ 17.037701] raid6: sse2x1 gen() 14440 MB/s
> > > [ 17.105818] raid6: sse2x1 xor() 10387 MB/s
> > > [ 17.108300] raid6: using algorithm avx2x2 gen() 32995 MB/s
> > > [ 17.110703] raid6: .... xor() 19596 MB/s, rmw enabled
> > > [ 17.113587] raid6: using avx2x2 recovery algorithm
> > > [ 17.327666] xor: automatically using best checksumming function avx
> > Yeah...
> >
> > [ 0.316064] raid6: neonx8 xor() 1087 MB/s
> > [ 0.452063] raid6: neonx4 xor() 1372 MB/s
> > [ 0.588064] raid6: neonx2 xor() 1610 MB/s
> > [ 0.724061] raid6: neonx1 xor() 1345 MB/s
> > [ 0.860072] raid6: int32x8 xor() 337 MB/s
> > [ 0.996092] raid6: int32x4 xor() 373 MB/s
> > [ 1.132087] raid6: int32x2 xor() 348 MB/s
> > [ 1.268090] raid6: int32x1 xor() 281 MB/s
> > [ 1.268093] raid6: .... xor() 1610 MB/s, rmw enabled
> >
> > Not as fast here.
>
> What's the raw speed of the hw offload? Measured on large data so that
> the overhead is negligible.
I have no idea how to benchmark such a thing. I assume it could be
done indirectly.
>
> It might make sense to add the async support in case the speed is
> comparable or better to the CPU, but also to reduce the CPU load.
I think the latter is the reason Marvell added hardware support for
doing parity calculations.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about btrfs and XOR offloading
2021-01-05 22:40 ` Rosen Penev
@ 2021-01-06 12:37 ` David Sterba
2021-01-06 21:52 ` Rosen Penev
0 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2021-01-06 12:37 UTC (permalink / raw)
To: Rosen Penev; +Cc: dsterba, linux-btrfs
On Tue, Jan 05, 2021 at 02:40:28PM -0800, Rosen Penev wrote:
> > What's the raw speed of the hw offload? Measured on large data so that
> > the overhead is negligible.
> I have no idea how to benchmark such a thing. I assume it could be
> done indirectly.
> >
> > It might make sense to add the async support in case the speed is
> > comparable or better to the CPU, but also to reduce the CPU load.
> I think the latter is the reason Marvell added hardware support for
> doing parity calculations.
The support seems to be in NAS boxes and besides xor and raid5/6
calculations the engine can also do a memcpy offload. This could gain a
lot of performance and be cheap in terms of code. Full page copies are
wrapped under copy_page so we'd need to insert the offload code. Similar
for the raid5/6 calculations.
The MD-RAID already supports offloading so we have code to stea^Wcopy.
Overall it sounds worth to add the async support to btrfs as it would
help with the metadata updates too, there's a lot of memcpy/memmove.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about btrfs and XOR offloading
2021-01-06 12:37 ` David Sterba
@ 2021-01-06 21:52 ` Rosen Penev
0 siblings, 0 replies; 9+ messages in thread
From: Rosen Penev @ 2021-01-06 21:52 UTC (permalink / raw)
To: dsterba, Rosen Penev, linux-btrfs
On Wed, Jan 6, 2021 at 4:39 AM David Sterba <dsterba@suse.cz> wrote:
>
> On Tue, Jan 05, 2021 at 02:40:28PM -0800, Rosen Penev wrote:
> > > What's the raw speed of the hw offload? Measured on large data so that
> > > the overhead is negligible.
> > I have no idea how to benchmark such a thing. I assume it could be
> > done indirectly.
> > >
> > > It might make sense to add the async support in case the speed is
> > > comparable or better to the CPU, but also to reduce the CPU load.
> > I think the latter is the reason Marvell added hardware support for
> > doing parity calculations.
>
> The support seems to be in NAS boxes and besides xor and raid5/6
> calculations the engine can also do a memcpy offload. This could gain a
> lot of performance and be cheap in terms of code. Full page copies are
> wrapped under copy_page so we'd need to insert the offload code. Similar
> for the raid5/6 calculations.
>
> The MD-RAID already supports offloading so we have code to stea^Wcopy.
> Overall it sounds worth to add the async support to btrfs as it would
> help with the metadata updates too, there's a lot of memcpy/memmove.
That would be lovely. I assume 24 hour btrfs scrubs would become shorter.
Unfortunately I lack the expertise to implement this properly.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-01-06 21:53 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-03 3:50 Question about btrfs and XOR offloading Rosen Penev
2021-01-03 6:53 ` Qu Wenruo
2021-01-03 7:12 ` Rosen Penev
2021-01-04 14:44 ` David Sterba
2021-01-04 22:56 ` Rosen Penev
2021-01-05 15:33 ` David Sterba
2021-01-05 22:40 ` Rosen Penev
2021-01-06 12:37 ` David Sterba
2021-01-06 21:52 ` Rosen Penev
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.