All of lore.kernel.org
 help / color / mirror / Atom feed
* Difficulties pushing FIO towards many small files and WORM-style use-case
@ 2020-12-02 14:31 David Pineau
  2020-12-02 18:55 ` Sitsofe Wheeler
  2020-12-03 15:43 ` Elliott, Robert (Servers)
  0 siblings, 2 replies; 5+ messages in thread
From: David Pineau @ 2020-12-02 14:31 UTC (permalink / raw)
  To: fio

Hello,

First of all, allow me to thank the FIO developers for providing this
very complete tool to benchmark storage setups.

In the context of my work, I'm trying to compare two storage setups
using FIO, to prepare for a hardware evolution of one of our services.

As the use-case is pretty much well understood, I was trying to
reproduce it in the FIO configuration file that you'll find later in
this email. To give you a bit more context, our usage of the hardware
is to have a Content Delivery Cache Software (homewritten) which
handles multiple layers of cache to distribute pieces of data whose
sizes range up from 16k to 10MB.
As we have data on the actual usage of this current software, we know
the spread of accesses to various size ranges, and we rely on a huge
number of files accessed by the multi-threaded service. As the pieces
of data can live a long time on this service and are immutable, I'm
trying to go for a WORM-style workload with FIO.

With this information in mind, I build the following FIO configuration file:

>>>>
[global]
# File-related config
directory=/mnt/test-mountpoint
nrfiles=3000
file_service_type=random
create_on_open=1
allow_file_create=1
filesize=16k-10m

# Io type config
rw=randrw
unified_rw_reporting=0
randrepeat=0
fallocate=none
end_fsync=0
overwrite=0
fsync_on_close=1
rwmixread=90
# In an attempt to reproduce a similar usage skew as our service...
# Spread IOs unevenly, skewed toward a part of the dataset:
# - 60% of IOs on 20% of data,
# - 20% of IOs on 30% of data,
# - 20% of IOs on 50% of data
random_distribution=zoned:60/20:20/30:20/50
# 100% Random reads, 0% Random writes (thus sequential)
percentage_random=100,0
# Likewise, configure different blocksizes for seq (write) & random (read) ops
bs_is_seq_rand=1
blocksize_range=128k-10m,
# Here's the blocksizes repartitions retrieved from our metrics during 3 hours
# Normally, it should be random within ranges, but this mode
# only uses fixed-size blocks, so we'll consider it good enough.
bssplit=,8k/10:16k/7:32k/9:64k/22:128k/21:256k/12:512k/14:1m/3:10m/2

# Threads/processes/job sync settings
thread=1

# IO/data Verify options
verify=null # Don't consume CPU please !

# Measurements and reporting settings
#per_job_logs=1
disk_util=1

# Io Engine config
ioengine=libaio


[cache-layer2]
# Jobs settings
time_based=1
runtime=60
numjobs=175
size=200M
<<<<<

With this configuration, I'm obligated to use the CLI option
"--alloc-size=256M" otherwise the preparatory memory allocation fails
and aborts.

That being said, despite this setting, I get the following issues that
I do not understand well enough to fix without your kind help:
 - OOM messages once the run starts
 - Setting "norandommap" does not seem to help, although I thought
that the memory issue was due to the "randommap" for my many files &
workers.
 - Impossible to increase the number of jobs/threads or files, because
once I do that. I get back to the memory pre-allocation issue, and no
amount of memory seems to fix it. (1g, 10g, etc..)
 - With these blockers, it seems impossible to push my current FIO
workload as far as saturating my hardware (which is my aim)
 - I observe that if I increase the settings of "size", "numjobs" or
"--alloc-size", the READ throughput measured by FIO goes down, while
the WRITE througput increases. I understand that increasing size for
seq write workload sincreases their throughput, but I'm at a loss in
front of the READ throughput behavior.

Do you have any advice on the configuration parameters I'm using to
push my hardware further towards its limits ?
Is there any mechanism within FIO that I'm misunderstanding, which is
causing me difficulty to do that ?

In advance, thank you for your kind advice and help,

--
David Pineau


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Difficulties pushing FIO towards many small files and WORM-style use-case
  2020-12-02 14:31 Difficulties pushing FIO towards many small files and WORM-style use-case David Pineau
@ 2020-12-02 18:55 ` Sitsofe Wheeler
  2020-12-03 17:43   ` David Pineau
  2020-12-03 15:43 ` Elliott, Robert (Servers)
  1 sibling, 1 reply; 5+ messages in thread
From: Sitsofe Wheeler @ 2020-12-02 18:55 UTC (permalink / raw)
  To: David Pineau; +Cc: fio

Hi,

On Wed, 2 Dec 2020 at 14:36, David Pineau <david.pineau@blade-group.com> wrote:
>
<snip>

> With this information in mind, I build the following FIO configuration file:
>
> >>>>
> [global]
> # File-related config
> directory=/mnt/test-mountpoint
> nrfiles=3000
> file_service_type=random
> create_on_open=1
> allow_file_create=1
> filesize=16k-10m
>
> # Io type config
> rw=randrw
> unified_rw_reporting=0
> randrepeat=0
> fallocate=none
> end_fsync=0
> overwrite=0
> fsync_on_close=1
> rwmixread=90
> # In an attempt to reproduce a similar usage skew as our service...
> # Spread IOs unevenly, skewed toward a part of the dataset:
> # - 60% of IOs on 20% of data,
> # - 20% of IOs on 30% of data,
> # - 20% of IOs on 50% of data
> random_distribution=zoned:60/20:20/30:20/50
> # 100% Random reads, 0% Random writes (thus sequential)
> percentage_random=100,0
> # Likewise, configure different blocksizes for seq (write) & random (read) ops
> bs_is_seq_rand=1
> blocksize_range=128k-10m,
> # Here's the blocksizes repartitions retrieved from our metrics during 3 hours
> # Normally, it should be random within ranges, but this mode
> # only uses fixed-size blocks, so we'll consider it good enough.
> bssplit=,8k/10:16k/7:32k/9:64k/22:128k/21:256k/12:512k/14:1m/3:10m/2
>
> # Threads/processes/job sync settings
> thread=1
>
> # IO/data Verify options
> verify=null # Don't consume CPU please !
>
> # Measurements and reporting settings
> #per_job_logs=1
> disk_util=1
>
> # Io Engine config
> ioengine=libaio
>
>
> [cache-layer2]
> # Jobs settings
> time_based=1
> runtime=60
> numjobs=175
> size=200M
> <<<<<
>
> With this configuration, I'm obligated to use the CLI option
> "--alloc-size=256M" otherwise the preparatory memory allocation fails
> and aborts.

<snip>

> Do you have any advice on the configuration parameters I'm using to
> push my hardware further towards its limits ?
> Is there any mechanism within FIO that I'm misunderstanding, which is
> causing me difficulty to do that ?
>
> In advance, thank you for your kind advice and help,

Just to check, are you using the latest version of fio
(https://github.com/axboe/fio/releases ) and if not could you try the
latest one? Also could you remove any/every option from your jobfile
that doesn't prevent the problem from happening and post the cut down
version?

Thanks.

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Difficulties pushing FIO towards many small files and WORM-style use-case
  2020-12-02 14:31 Difficulties pushing FIO towards many small files and WORM-style use-case David Pineau
  2020-12-02 18:55 ` Sitsofe Wheeler
@ 2020-12-03 15:43 ` Elliott, Robert (Servers)
  2020-12-03 16:06   ` David Pineau
  1 sibling, 1 reply; 5+ messages in thread
From: Elliott, Robert (Servers) @ 2020-12-03 15:43 UTC (permalink / raw)
  To: David Pineau, fio



> -----Original Message-----
> From: David Pineau <david.pineau@blade-group.com>
> Sent: Wednesday, December 02, 2020 8:32 AM
> To: fio@vger.kernel.org
> Subject: Difficulties pushing FIO towards many small files and WORM-
> style use-case
> 
> Hello,
> 
...
> As we have data on the actual usage of this current software, we know
> the spread of accesses to various size ranges, and we rely on a huge
> number of files accessed by the multi-threaded service. As the pieces
> of data can live a long time on this service and are immutable, I'm
> trying to go for a WORM-style workload with FIO.
> 
> With this information in mind, I build the following FIO
> configuration file:
> 
> >>>>
> [global]
> # File-related config
> directory=/mnt/test-mountpoint
> nrfiles=3000

I don't think most systems can handle 3000 open files at a time for
one process. Try
	ulimit -n

which might report that the default is 1024 open files per process.

The fio openfiles=<int> option can be used to limit how many files it
keeps open at a time.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Difficulties pushing FIO towards many small files and WORM-style use-case
  2020-12-03 15:43 ` Elliott, Robert (Servers)
@ 2020-12-03 16:06   ` David Pineau
  0 siblings, 0 replies; 5+ messages in thread
From: David Pineau @ 2020-12-03 16:06 UTC (permalink / raw)
  To: Elliott, Robert (Servers); +Cc: fio

On Thu, Dec 3, 2020 at 4:43 PM Elliott, Robert (Servers)
<elliott@hpe.com> wrote:
>
>
>
> > -----Original Message-----
> > From: David Pineau <david.pineau@blade-group.com>
> > Sent: Wednesday, December 02, 2020 8:32 AM
> > To: fio@vger.kernel.org
> > Subject: Difficulties pushing FIO towards many small files and WORM-
> > style use-case
> >
> > Hello,
> >
> ...
> > As we have data on the actual usage of this current software, we know
> > the spread of accesses to various size ranges, and we rely on a huge
> > number of files accessed by the multi-threaded service. As the pieces
> > of data can live a long time on this service and are immutable, I'm
> > trying to go for a WORM-style workload with FIO.
> >
> > With this information in mind, I build the following FIO
> > configuration file:
> >
> > >>>>
> > [global]
> > # File-related config
> > directory=/mnt/test-mountpoint
> > nrfiles=3000
>
> I don't think most systems can handle 3000 open files at a time for
> one process. Try
>         ulimit -n
>
> which might report that the default is 1024 open files per process.
>
> The fio openfiles=<int> option can be used to limit how many files it
> keeps open at a time.

The number of open file descriptors is tuned on our servers (the
system is usually running between 30-40k open files constantly
according to our monitoring), so I'm not limited on that side, but I
had indeed noticed the "openfiles" option, thanks for hinting at that.

>
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Difficulties pushing FIO towards many small files and WORM-style use-case
  2020-12-02 18:55 ` Sitsofe Wheeler
@ 2020-12-03 17:43   ` David Pineau
  0 siblings, 0 replies; 5+ messages in thread
From: David Pineau @ 2020-12-03 17:43 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: fio

Hello,

I was using Debian Buster's packaged version (3.12 I believe). I'm now
using the latest built from source, and it seems much more cooperative
on various fronts, though memory issues can still be reached with a
very big number of files. I still tried to reproduce the memory issue
and had difficulties reproducing the behavior previously observed; but
maybe the numbers used are a bit unorthodox. Since I could not
reproduce it quickly, I kind-of dropped the subject for now (sorry
Sitsofe, I hope that's not a bother?), as I'd rather focus on my aim
if you'll indulge me.

So, I'd like to start with a few straight-forward questions:
 - Does allocating files for the read-test ahead of time help
maximizing throughput ? (I feel like
 - Is there a list of options that are clearly at risk of affecting
the observed READ throughput ?
 - Does the Read/Write randomness/spread/scheduling affect maximal throughput ?
 - Is there a way to compute the required memory for the smalloc pools ?

That being said, I'd still like to try and reproduce a workload
similar to our service's with the aim of maximizing throughput but I'm
observing the following behavior, which surprised me:
 - If I increase the number of files (nrfiles), the throughput goes down
 - If I increase the number of worker threads (numjobs), the
throughput goes down
 - If I increase the "size" of the data to use for the job, the
throughput goes down

Note that this was done without modifying any other parameter (they're
all 60 seconds runs in an attempt to reduce the skew from short-lived
runs).
While the specific setup of our workload may partly explain these
behaviors, I'm surprised that on a 8-NVMe disks (3.8TB each) RAID10, I
cannot efficiently use random reads to reach the hardware's limits.








On Wed, Dec 2, 2020 at 7:55 PM Sitsofe Wheeler <sitsofe@gmail.com> wrote:
>
> Hi,
>
> On Wed, 2 Dec 2020 at 14:36, David Pineau <david.pineau@blade-group.com> wrote:
> >
> <snip>
>
> > With this information in mind, I build the following FIO configuration file:
> >
> > >>>>
> > [global]
> > # File-related config
> > directory=/mnt/test-mountpoint
> > nrfiles=3000
> > file_service_type=random
> > create_on_open=1
> > allow_file_create=1
> > filesize=16k-10m
> >
> > # Io type config
> > rw=randrw
> > unified_rw_reporting=0
> > randrepeat=0
> > fallocate=none
> > end_fsync=0
> > overwrite=0
> > fsync_on_close=1
> > rwmixread=90
> > # In an attempt to reproduce a similar usage skew as our service...
> > # Spread IOs unevenly, skewed toward a part of the dataset:
> > # - 60% of IOs on 20% of data,
> > # - 20% of IOs on 30% of data,
> > # - 20% of IOs on 50% of data
> > random_distribution=zoned:60/20:20/30:20/50
> > # 100% Random reads, 0% Random writes (thus sequential)
> > percentage_random=100,0
> > # Likewise, configure different blocksizes for seq (write) & random (read) ops
> > bs_is_seq_rand=1
> > blocksize_range=128k-10m,
> > # Here's the blocksizes repartitions retrieved from our metrics during 3 hours
> > # Normally, it should be random within ranges, but this mode
> > # only uses fixed-size blocks, so we'll consider it good enough.
> > bssplit=,8k/10:16k/7:32k/9:64k/22:128k/21:256k/12:512k/14:1m/3:10m/2
> >
> > # Threads/processes/job sync settings
> > thread=1
> >
> > # IO/data Verify options
> > verify=null # Don't consume CPU please !
> >
> > # Measurements and reporting settings
> > #per_job_logs=1
> > disk_util=1
> >
> > # Io Engine config
> > ioengine=libaio
> >
> >
> > [cache-layer2]
> > # Jobs settings
> > time_based=1
> > runtime=60
> > numjobs=175
> > size=200M
> > <<<<<
> >
> > With this configuration, I'm obligated to use the CLI option
> > "--alloc-size=256M" otherwise the preparatory memory allocation fails
> > and aborts.
>
> <snip>
>
> > Do you have any advice on the configuration parameters I'm using to
> > push my hardware further towards its limits ?
> > Is there any mechanism within FIO that I'm misunderstanding, which is
> > causing me difficulty to do that ?
> >
> > In advance, thank you for your kind advice and help,
>
> Just to check, are you using the latest version of fio
> (https://github.com/axboe/fio/releases ) and if not could you try the
> latest one? Also could you remove any/every option from your jobfile
> that doesn't prevent the problem from happening and post the cut down
> version?
>
> Thanks.
>
> --
> Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-03 17:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-02 14:31 Difficulties pushing FIO towards many small files and WORM-style use-case David Pineau
2020-12-02 18:55 ` Sitsofe Wheeler
2020-12-03 17:43   ` David Pineau
2020-12-03 15:43 ` Elliott, Robert (Servers)
2020-12-03 16:06   ` David Pineau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.