All of lore.kernel.org
 help / color / mirror / Atom feed
* Varying Leafsize and Nodesize in Btrfs
@ 2012-08-30 15:18 Mitch Harder
  2012-08-30 16:25 ` Josef Bacik
  2012-10-12 10:32 ` Martin Steigerwald
  0 siblings, 2 replies; 9+ messages in thread
From: Mitch Harder @ 2012-08-30 15:18 UTC (permalink / raw)
  To: linux-btrfs

I've been trying out different leafsize/nodesize settings by
benchmarking some typical operations.

These changes had more impact than I expected.  Using a
leafsize/nodesize of either 8192 or 16384 provided a noticeable
improvement in my limited testing.

These results are similar to some that Chris Mason has already
reported:  https://oss.oracle.com/~mason/blocksizes/

I noticed that metadata allocation was more efficient with bigger
block sizes.  My data was git kernel sources, which will utilize
btrfs' inlining.  This may have tilted the scales.

Read operations seemed to benefit the most.  Write operations seemed
to get punished when the leafsize/nodesize was increased to 64K.

Are there any known downsides to using a leafsize/nodesize bigger than
the default 4096?


Time (seconds) to finish 7 simultaneous copy operations on a set of
Linux kernel git sources.

Leafsize/
Nodesize    Time (Std Dev%)
4096         124.7 (1.25%)
8192         115.2 (0.69%)
16384        114.8 (0.53%)
65536        130.5 (0.3%)


Time (seconds) to finish 'git status' on a set of Linux kernel git sources.

Leafsize/
Nodesize    Time (Std Dev%)
4096          13.2 (0.86%)
8192          11.2 (1.36%)
16384          9.0 (0.92%)
65536          8.5 (1.3%)


Time (seconds) to perform a git checkout of a different branch on a
set of Linux kernel sources.

Leafsize/
Nodesize    Time (Std Dev%)
4096          19.4 (1.1%)
8192          16.94 (3.1%)
16384         14.4 (0.6%)
65536         16.3 (0.8%)


Time (seconds) to perform 7 simultaneous rsync threads on the Linux
kernel git sources directories.

Leafsize/
Nodesize    Time (Std Dev%)
4096         410.3 (4.5%)
8192         289.8 (0.96%)
16384        250.7 (3.8%)
65536        227.0 (1.2%)


Used Metadata (MB) as reported by 'btrfs fi df'

Leafsize/
Nodesize    Size (Std Dev%)
4096         484 MB (0.13%)
8192         443 MB (0.2%)
16384        424 MB (0.2%)
65536        411 MB (0.2%)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Varying Leafsize and Nodesize in Btrfs
  2012-08-30 15:18 Varying Leafsize and Nodesize in Btrfs Mitch Harder
@ 2012-08-30 16:25 ` Josef Bacik
  2012-08-30 21:34   ` Martin Steigerwald
  2012-10-11 17:58   ` Phillip Susi
  2012-10-12 10:32 ` Martin Steigerwald
  1 sibling, 2 replies; 9+ messages in thread
From: Josef Bacik @ 2012-08-30 16:25 UTC (permalink / raw)
  To: Mitch Harder; +Cc: linux-btrfs

On Thu, Aug 30, 2012 at 09:18:07AM -0600, Mitch Harder wrote:
> I've been trying out different leafsize/nodesize settings by
> benchmarking some typical operations.
> 
> These changes had more impact than I expected.  Using a
> leafsize/nodesize of either 8192 or 16384 provided a noticeable
> improvement in my limited testing.
> 
> These results are similar to some that Chris Mason has already
> reported:  https://oss.oracle.com/~mason/blocksizes/
> 
> I noticed that metadata allocation was more efficient with bigger
> block sizes.  My data was git kernel sources, which will utilize
> btrfs' inlining.  This may have tilted the scales.
> 
> Read operations seemed to benefit the most.  Write operations seemed
> to get punished when the leafsize/nodesize was increased to 64K.
> 
> Are there any known downsides to using a leafsize/nodesize bigger than
> the default 4096?
> 

Once you cross some hardware dependant threshold (usually past 32k) you start
incurring high memmove() overhead in most workloads.  Like all benchmarking its
good to test your workload and see what works best, but 16k should generally be
the best option.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Varying Leafsize and Nodesize in Btrfs
  2012-08-30 16:25 ` Josef Bacik
@ 2012-08-30 21:34   ` Martin Steigerwald
  2012-08-30 21:50     ` Josef Bacik
  2012-08-31  5:02     ` Roman Mamedov
  2012-10-11 17:58   ` Phillip Susi
  1 sibling, 2 replies; 9+ messages in thread
From: Martin Steigerwald @ 2012-08-30 21:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik, Mitch Harder

Am Donnerstag, 30. August 2012 schrieb Josef Bacik:
> On Thu, Aug 30, 2012 at 09:18:07AM -0600, Mitch Harder wrote:
> > I've been trying out different leafsize/nodesize settings by
> > benchmarking some typical operations.
> > 
> > These changes had more impact than I expected.  Using a
> > leafsize/nodesize of either 8192 or 16384 provided a noticeable
> > improvement in my limited testing.
> > 
> > These results are similar to some that Chris Mason has already
> > reported:  https://oss.oracle.com/~mason/blocksizes/
> > 
> > I noticed that metadata allocation was more efficient with bigger
> > block sizes.  My data was git kernel sources, which will utilize
> > btrfs' inlining.  This may have tilted the scales.
> > 
> > Read operations seemed to benefit the most.  Write operations seemed
> > to get punished when the leafsize/nodesize was increased to 64K.
> > 
> > Are there any known downsides to using a leafsize/nodesize bigger
> > than the default 4096?
> 
> Once you cross some hardware dependant threshold (usually past 32k) you
> start incurring high memmove() overhead in most workloads.  Like all
> benchmarking its good to test your workload and see what works best,
> but 16k should generally be the best option.  Thanks,

I wanted to ask about 32k either.

I used 32k on one 2,5 inch external esata disk. But I never measured 
anything so far.

I wonder what a good value for SSD might be. I tend to not use anymore 
than 16k, but thats just some gut feeling right now. Nothing based on a 
well-founded explaination.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Varying Leafsize and Nodesize in Btrfs
  2012-08-30 21:34   ` Martin Steigerwald
@ 2012-08-30 21:50     ` Josef Bacik
  2012-08-31  0:01       ` Chris Mason
  2012-08-31  5:02     ` Roman Mamedov
  1 sibling, 1 reply; 9+ messages in thread
From: Josef Bacik @ 2012-08-30 21:50 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-btrfs, Josef Bacik, Mitch Harder

On Thu, Aug 30, 2012 at 03:34:49PM -0600, Martin Steigerwald wrote:
> Am Donnerstag, 30. August 2012 schrieb Josef Bacik:
> > On Thu, Aug 30, 2012 at 09:18:07AM -0600, Mitch Harder wrote:
> > > I've been trying out different leafsize/nodesize settings by
> > > benchmarking some typical operations.
> > > 
> > > These changes had more impact than I expected.  Using a
> > > leafsize/nodesize of either 8192 or 16384 provided a noticeable
> > > improvement in my limited testing.
> > > 
> > > These results are similar to some that Chris Mason has already
> > > reported:  https://oss.oracle.com/~mason/blocksizes/
> > > 
> > > I noticed that metadata allocation was more efficient with bigger
> > > block sizes.  My data was git kernel sources, which will utilize
> > > btrfs' inlining.  This may have tilted the scales.
> > > 
> > > Read operations seemed to benefit the most.  Write operations seemed
> > > to get punished when the leafsize/nodesize was increased to 64K.
> > > 
> > > Are there any known downsides to using a leafsize/nodesize bigger
> > > than the default 4096?
> > 
> > Once you cross some hardware dependant threshold (usually past 32k) you
> > start incurring high memmove() overhead in most workloads.  Like all
> > benchmarking its good to test your workload and see what works best,
> > but 16k should generally be the best option.  Thanks,
> 
> I wanted to ask about 32k either.
> 
> I used 32k on one 2,5 inch external esata disk. But I never measured 
> anything so far.
> 
> I wonder what a good value for SSD might be. I tend to not use anymore 
> than 16k, but thats just some gut feeling right now. Nothing based on a 
> well-founded explaination.
>

32k really starts to depend on your workload.  Generally speaking everybody will
be faster with 16k, but 32k starts to depend on your workload and hardware, and
then anything about 64k really starts to hurt with memmove().  With this sort of
thing SSD vs not isn't going to make much of a difference, erase blocks tend to
be several megs in size so you aren't going to get anywhere close to avoiding
the internal RMW cycle inside the ssd.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Varying Leafsize and Nodesize in Btrfs
  2012-08-30 21:50     ` Josef Bacik
@ 2012-08-31  0:01       ` Chris Mason
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Mason @ 2012-08-31  0:01 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Martin Steigerwald, linux-btrfs, Mitch Harder

On Thu, Aug 30, 2012 at 03:50:08PM -0600, Josef Bacik wrote:
> On Thu, Aug 30, 2012 at 03:34:49PM -0600, Martin Steigerwald wrote:
> > I wonder what a good value for SSD might be. I tend to not use anymore 
> > than 16k, but thats just some gut feeling right now. Nothing based on a 
> > well-founded explaination.
> >
> 
> 32k really starts to depend on your workload.  Generally speaking everybody will
> be faster with 16k, but 32k starts to depend on your workload and hardware, and
> then anything about 64k really starts to hurt with memmove().  With this sort of
> thing SSD vs not isn't going to make much of a difference, erase blocks tend to
> be several megs in size so you aren't going to get anywhere close to avoiding
> the internal RMW cycle inside the ssd.  Thanks,

I almost made 16k the default, but the problem is that it does increase
lock contention because bigger nodes mean fewer locks.  You can see this
with dbench and compilebench, especially early in the FS life.

My goal is to make the cow step of btrfs_search_slot really atomic, so
we don't have to switch to a blocking lock.  That will really fix a lot
of contention problems.

-chris


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Varying Leafsize and Nodesize in Btrfs
  2012-08-30 21:34   ` Martin Steigerwald
  2012-08-30 21:50     ` Josef Bacik
@ 2012-08-31  5:02     ` Roman Mamedov
  1 sibling, 0 replies; 9+ messages in thread
From: Roman Mamedov @ 2012-08-31  5:02 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-btrfs, Josef Bacik, Mitch Harder

[-- Attachment #1: Type: text/plain, Size: 1015 bytes --]

On Thu, 30 Aug 2012 23:34:49 +0200
Martin Steigerwald <Martin@lichtvoll.de> wrote:

> I wanted to ask about 32k either.
> 
> I used 32k on one 2,5 inch external esata disk. But I never measured 
> anything so far.
> 
> I wonder what a good value for SSD might be. I tend to not use anymore 
> than 16k, but thats just some gut feeling right now. Nothing based on a 
> well-founded explaination.

If you look closely at https://oss.oracle.com/~mason/blocksizes/ , you will
notice that 16K delivers almost all of the 32K's performance gains in "Read",
while not suffering from slowdowns that 32K shows in "Create" and "Delete".

I have chosen 16K for my new /home partition (on an SSD+HDD mdadm RAID1).
But what disappointed me at the time, is that one can't seem to have a "mixed"
allocation FS with non-default leaf/node sizes.

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Varying Leafsize and Nodesize in Btrfs
  2012-08-30 16:25 ` Josef Bacik
  2012-08-30 21:34   ` Martin Steigerwald
@ 2012-10-11 17:58   ` Phillip Susi
  1 sibling, 0 replies; 9+ messages in thread
From: Phillip Susi @ 2012-10-11 17:58 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Mitch Harder, linux-btrfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/30/2012 12:25 PM, Josef Bacik wrote:
> Once you cross some hardware dependant threshold (usually past 32k)
> you start incurring high memmove() overhead in most workloads.
> Like all benchmarking its good to test your workload and see what
> works best, but 16k should generally be the best option.  Thanks,
> 
> Josef

Why are memmove()s neccesary, can they be avoided, and why do they
incur more overhead with 32k+ sizes?



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQdwjZAAoJEJrBOlT6nu75rkYH/RYXBbAJfIG2KmmmFA8kSIiL
EEvdA9KRnVH08h2lnB26xNdCPbf59M7GrH2hZK48gM9x4OQPzKXf8eCTYTy4mFKy
mqTPFsgcPveTFtgoRXkuhZvUXMpFV4M8I7MLZRCcxk5KWTwA/slcunQxG7BMz/V4
tBxE8ya2Hxej2VJe4AbLR6PJbvCGsFXNlxBpUy9Qh7q0TmDeGzsoaZ1We1itNjQZ
wWjTerka2qe9dyP8EOUp/uZqGUQXu1TUKbTLygsfMb11/vGMkoUkZtTa0f9lQosw
10UlA8TyqAkLX3gpQzsJVCwiRuNWQBbQqvdYq3dCQOgzBbvOdvD6TtmeS1saO4o=
=qV0c
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Varying Leafsize and Nodesize in Btrfs
  2012-08-30 15:18 Varying Leafsize and Nodesize in Btrfs Mitch Harder
  2012-08-30 16:25 ` Josef Bacik
@ 2012-10-12 10:32 ` Martin Steigerwald
  2012-10-12 12:52   ` Martin Steigerwald
  1 sibling, 1 reply; 9+ messages in thread
From: Martin Steigerwald @ 2012-10-12 10:32 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Mitch Harder

Am Donnerstag, 30. August 2012 schrieb Mitch Harder:
> I've been trying out different leafsize/nodesize settings by
> benchmarking some typical operations.
> 
> These changes had more impact than I expected.  Using a
> leafsize/nodesize of either 8192 or 16384 provided a noticeable
> improvement in my limited testing.
> 
> These results are similar to some that Chris Mason has already
> reported:  https://oss.oracle.com/~mason/blocksizes/
> 
> I noticed that metadata allocation was more efficient with bigger
> block sizes.  My data was git kernel sources, which will utilize
> btrfs' inlining.  This may have tilted the scales.
> 
> Read operations seemed to benefit the most.  Write operations seemed
> to get punished when the leafsize/nodesize was increased to 64K.
> 
> Are there any known downsides to using a leafsize/nodesize bigger than
> the default 4096?
> 
> 
> Time (seconds) to finish 7 simultaneous copy operations on a set of
> Linux kernel git sources.
> 
> Leafsize/
> Nodesize    Time (Std Dev%)
> 4096         124.7 (1.25%)
> 8192         115.2 (0.69%)
> 16384        114.8 (0.53%)
> 65536        130.5 (0.3%)

Thanks for your testing, Mitch.

I would be interested in results for 32768 bytes as well.

Why?

It improves until 16384 bytes but then it gets worse with 65536 bytes. It 
would be interesting to know whether it improves for 32768 or already gets 
worse with that value :)

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Varying Leafsize and Nodesize in Btrfs
  2012-10-12 10:32 ` Martin Steigerwald
@ 2012-10-12 12:52   ` Martin Steigerwald
  0 siblings, 0 replies; 9+ messages in thread
From: Martin Steigerwald @ 2012-10-12 12:52 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Mitch Harder

Am Freitag, 12. Oktober 2012 schrieb Martin Steigerwald:
> > Time (seconds) to finish 7 simultaneous copy operations on a set of
> > Linux kernel git sources.
> >
> > 
> >
> > Leafsize/
> > Nodesize    Time (Std Dev%)
> > 4096         124.7 (1.25%)
> > 8192         115.2 (0.69%)
> > 16384        114.8 (0.53%)
> > 65536        130.5 (0.3%)
> 
> Thanks for your testing, Mitch.
> 
> I would be interested in results for 32768 bytes as well.
> 
> Why?
> 
> It improves until 16384 bytes but then it gets worse with 65536 bytes.
> It  would be interesting to know whether it improves for 32768 or
> already gets worse with that value :)

Please ignore. I was answering to a old thread that was shown on top of 
message list to answer to Phillip again. We had that topic already.

Sorry for noise.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-10-12 12:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-30 15:18 Varying Leafsize and Nodesize in Btrfs Mitch Harder
2012-08-30 16:25 ` Josef Bacik
2012-08-30 21:34   ` Martin Steigerwald
2012-08-30 21:50     ` Josef Bacik
2012-08-31  0:01       ` Chris Mason
2012-08-31  5:02     ` Roman Mamedov
2012-10-11 17:58   ` Phillip Susi
2012-10-12 10:32 ` Martin Steigerwald
2012-10-12 12:52   ` Martin Steigerwald

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.