All of lore.kernel.org
 help / color / mirror / Atom feed
* Is XFS suitable for 350 million files on 20TB storage?
@ 2014-09-05  9:47 Stefan Priebe - Profihost AG
  2014-09-05 12:30 ` Brian Foster
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Priebe - Profihost AG @ 2014-09-05  9:47 UTC (permalink / raw)
  To: xfs

Hi,

i have a backup system running 20TB of storage having 350 million files.
This was working fine for month.

But now the free space is so heavily fragmented that i only see the
kworker with 4x 100% CPU and write speed beeing very slow. 15TB of the
20TB are in use.

Overall files are 350 Million - all in different directories. Max 5000
per dir.

Kernel is 3.10.53 and mount options are:
noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota

# xfs_db -r -c freesp /dev/sda1
   from      to extents  blocks    pct
      1       1 29484138 29484138   2,16
      2       3 16930134 39834672   2,92
      4       7 16169985 87877159   6,45
      8      15 78202543 999838327  73,41
     16      31 3562456 83746085   6,15
     32      63 2370812 102124143   7,50
     64     127  280885 18929867   1,39
    256     511       2     827   0,00
    512    1023      65   35092   0,00
   2048    4095       2    6561   0,00
  16384   32767       1   23951   0,00

Is there anything i can optimize? Or is it just a bad idea to do this
with XFS? Any other options? Maybe rsync options like --inplace /
--no-whole-file?

Greets,
Stefan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-05  9:47 Is XFS suitable for 350 million files on 20TB storage? Stefan Priebe - Profihost AG
@ 2014-09-05 12:30 ` Brian Foster
  2014-09-05 12:40   ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 18+ messages in thread
From: Brian Foster @ 2014-09-05 12:30 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: xfs

On Fri, Sep 05, 2014 at 11:47:29AM +0200, Stefan Priebe - Profihost AG wrote:
> Hi,
> 
> i have a backup system running 20TB of storage having 350 million files.
> This was working fine for month.
> 
> But now the free space is so heavily fragmented that i only see the
> kworker with 4x 100% CPU and write speed beeing very slow. 15TB of the
> 20TB are in use.
> 
> Overall files are 350 Million - all in different directories. Max 5000
> per dir.
> 
> Kernel is 3.10.53 and mount options are:
> noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota
> 
> # xfs_db -r -c freesp /dev/sda1
>    from      to extents  blocks    pct
>       1       1 29484138 29484138   2,16
>       2       3 16930134 39834672   2,92
>       4       7 16169985 87877159   6,45
>       8      15 78202543 999838327  73,41
>      16      31 3562456 83746085   6,15
>      32      63 2370812 102124143   7,50
>      64     127  280885 18929867   1,39
>     256     511       2     827   0,00
>     512    1023      65   35092   0,00
>    2048    4095       2    6561   0,00
>   16384   32767       1   23951   0,00
> 
> Is there anything i can optimize? Or is it just a bad idea to do this
> with XFS? Any other options? Maybe rsync options like --inplace /
> --no-whole-file?
> 

It's probably a good idea to include more information about your fs:

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

... as well as what your typical workflow/dataset is for this fs. It
seems like you have relatively small files (15TB used across 350m files
is around 46k per file), yes? If so, I wonder if something like the
following commit introduced in 3.12 would help:

133eeb17 xfs: don't use speculative prealloc for small files

Brian

> Greets,
> Stefan
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-05 12:30 ` Brian Foster
@ 2014-09-05 12:40   ` Stefan Priebe - Profihost AG
  2014-09-05 13:48     ` Brian Foster
  2014-09-05 23:05     ` Dave Chinner
  0 siblings, 2 replies; 18+ messages in thread
From: Stefan Priebe - Profihost AG @ 2014-09-05 12:40 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs


Am 05.09.2014 um 14:30 schrieb Brian Foster:
> On Fri, Sep 05, 2014 at 11:47:29AM +0200, Stefan Priebe - Profihost AG wrote:
>> Hi,
>>
>> i have a backup system running 20TB of storage having 350 million files.
>> This was working fine for month.
>>
>> But now the free space is so heavily fragmented that i only see the
>> kworker with 4x 100% CPU and write speed beeing very slow. 15TB of the
>> 20TB are in use.
>>
>> Overall files are 350 Million - all in different directories. Max 5000
>> per dir.
>>
>> Kernel is 3.10.53 and mount options are:
>> noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota
>>
>> # xfs_db -r -c freesp /dev/sda1
>>    from      to extents  blocks    pct
>>       1       1 29484138 29484138   2,16
>>       2       3 16930134 39834672   2,92
>>       4       7 16169985 87877159   6,45
>>       8      15 78202543 999838327  73,41
>>      16      31 3562456 83746085   6,15
>>      32      63 2370812 102124143   7,50
>>      64     127  280885 18929867   1,39
>>     256     511       2     827   0,00
>>     512    1023      65   35092   0,00
>>    2048    4095       2    6561   0,00
>>   16384   32767       1   23951   0,00
>>
>> Is there anything i can optimize? Or is it just a bad idea to do this
>> with XFS? Any other options? Maybe rsync options like --inplace /
>> --no-whole-file?
>>
> 
> It's probably a good idea to include more information about your fs:
> 
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

Generally sure but the problem itself is clear. If you look at the free
space allocation you see that free space is heavily fragmented.

But here you go:
- 3.10.53 vanilla
- xfs_repair version 3.1.11
- 16 cores
- /dev/sda1 /backup xfs
rw,noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota 0 0
- Raid 10 with 1GB controller cache running in write back mode using 24
spinners
- no lvm
- no io waits
- xfs_info /serverbackup/
meta-data=/dev/sda1              isize=256    agcount=21,
agsize=268435455 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=5369232896, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

anything missing?

> ... as well as what your typical workflow/dataset is for this fs. It
> seems like you have relatively small files (15TB used across 350m files
> is around 46k per file), yes?

Yes - most fo them are even smaller. And some files are > 5GB.

> If so, I wonder if something like the
> following commit introduced in 3.12 would help:
> 
> 133eeb17 xfs: don't use speculative prealloc for small files

Looks interesting.

Stefan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-05 12:40   ` Stefan Priebe - Profihost AG
@ 2014-09-05 13:48     ` Brian Foster
  2014-09-05 18:07       ` Stefan Priebe
  2014-09-05 23:05     ` Dave Chinner
  1 sibling, 1 reply; 18+ messages in thread
From: Brian Foster @ 2014-09-05 13:48 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: xfs

On Fri, Sep 05, 2014 at 02:40:32PM +0200, Stefan Priebe - Profihost AG wrote:
> 
> Am 05.09.2014 um 14:30 schrieb Brian Foster:
> > On Fri, Sep 05, 2014 at 11:47:29AM +0200, Stefan Priebe - Profihost AG wrote:
> >> Hi,
> >>
> >> i have a backup system running 20TB of storage having 350 million files.
> >> This was working fine for month.
> >>
> >> But now the free space is so heavily fragmented that i only see the
> >> kworker with 4x 100% CPU and write speed beeing very slow. 15TB of the
> >> 20TB are in use.
> >>
> >> Overall files are 350 Million - all in different directories. Max 5000
> >> per dir.
> >>
> >> Kernel is 3.10.53 and mount options are:
> >> noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota
> >>
> >> # xfs_db -r -c freesp /dev/sda1
> >>    from      to extents  blocks    pct
> >>       1       1 29484138 29484138   2,16
> >>       2       3 16930134 39834672   2,92
> >>       4       7 16169985 87877159   6,45
> >>       8      15 78202543 999838327  73,41
> >>      16      31 3562456 83746085   6,15
> >>      32      63 2370812 102124143   7,50
> >>      64     127  280885 18929867   1,39
> >>     256     511       2     827   0,00
> >>     512    1023      65   35092   0,00
> >>    2048    4095       2    6561   0,00
> >>   16384   32767       1   23951   0,00
> >>
> >> Is there anything i can optimize? Or is it just a bad idea to do this
> >> with XFS? Any other options? Maybe rsync options like --inplace /
> >> --no-whole-file?
> >>
> > 
> > It's probably a good idea to include more information about your fs:
> > 
> > http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> 
> Generally sure but the problem itself is clear. If you look at the free
> space allocation you see that free space is heavily fragmented.
> 
> But here you go:
> - 3.10.53 vanilla
> - xfs_repair version 3.1.11
> - 16 cores
> - /dev/sda1 /backup xfs
> rw,noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota 0 0
> - Raid 10 with 1GB controller cache running in write back mode using 24
> spinners
> - no lvm
> - no io waits
> - xfs_info /serverbackup/
> meta-data=/dev/sda1              isize=256    agcount=21,
> agsize=268435455 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=5369232896, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=521728, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> anything missing?
> 

What's the workload to the fs? Is it repeated rsync's from a constantly
changing dataset? Do the files change frequently or are they only ever
added/removed?

Also, what is the characterization of writes being "slow?" An rsync is
slower than normal? Sustained writes to a single file? How significant a
degradation?

Something like the following might be interesting as well:

for i in $(seq 0 20); do xfs_db -c "agi $i" -c "p freecount" <dev>; done

Brian

> > ... as well as what your typical workflow/dataset is for this fs. It
> > seems like you have relatively small files (15TB used across 350m files
> > is around 46k per file), yes?
> 
> Yes - most fo them are even smaller. And some files are > 5GB.
> 
> > If so, I wonder if something like the
> > following commit introduced in 3.12 would help:
> > 
> > 133eeb17 xfs: don't use speculative prealloc for small files
> 
> Looks interesting.
> 
> Stefan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-05 13:48     ` Brian Foster
@ 2014-09-05 18:07       ` Stefan Priebe
  2014-09-05 19:18         ` Brian Foster
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Priebe @ 2014-09-05 18:07 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

Hi,

Am 05.09.2014 15:48, schrieb Brian Foster:
> On Fri, Sep 05, 2014 at 02:40:32PM +0200, Stefan Priebe - Profihost AG wrote:
>>
>> Am 05.09.2014 um 14:30 schrieb Brian Foster:
>>> On Fri, Sep 05, 2014 at 11:47:29AM +0200, Stefan Priebe - Profihost AG wrote:
>>>> Hi,
>>>>
>>>> i have a backup system running 20TB of storage having 350 million files.
>>>> This was working fine for month.
>>>>
>>>> But now the free space is so heavily fragmented that i only see the
>>>> kworker with 4x 100% CPU and write speed beeing very slow. 15TB of the
>>>> 20TB are in use.
>>>>
>>>> Overall files are 350 Million - all in different directories. Max 5000
>>>> per dir.
>>>>
>>>> Kernel is 3.10.53 and mount options are:
>>>> noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota
>>>>
>>>> # xfs_db -r -c freesp /dev/sda1
>>>>     from      to extents  blocks    pct
>>>>        1       1 29484138 29484138   2,16
>>>>        2       3 16930134 39834672   2,92
>>>>        4       7 16169985 87877159   6,45
>>>>        8      15 78202543 999838327  73,41
>>>>       16      31 3562456 83746085   6,15
>>>>       32      63 2370812 102124143   7,50
>>>>       64     127  280885 18929867   1,39
>>>>      256     511       2     827   0,00
>>>>      512    1023      65   35092   0,00
>>>>     2048    4095       2    6561   0,00
>>>>    16384   32767       1   23951   0,00
>>>>
>>>> Is there anything i can optimize? Or is it just a bad idea to do this
>>>> with XFS? Any other options? Maybe rsync options like --inplace /
>>>> --no-whole-file?
>>>>
>>>
>>> It's probably a good idea to include more information about your fs:
>>>
>>> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
>>
>> Generally sure but the problem itself is clear. If you look at the free
>> space allocation you see that free space is heavily fragmented.
>>
>> But here you go:
>> - 3.10.53 vanilla
>> - xfs_repair version 3.1.11
>> - 16 cores
>> - /dev/sda1 /backup xfs
>> rw,noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota 0 0
>> - Raid 10 with 1GB controller cache running in write back mode using 24
>> spinners
>> - no lvm
>> - no io waits
>> - xfs_info /serverbackup/
>> meta-data=/dev/sda1              isize=256    agcount=21,
>> agsize=268435455 blks
>>           =                       sectsz=512   attr=2
>> data     =                       bsize=4096   blocks=5369232896, imaxpct=5
>>           =                       sunit=0      swidth=0 blks
>> naming   =version 2              bsize=4096   ascii-ci=0
>> log      =internal               bsize=4096   blocks=521728, version=2
>>           =                       sectsz=512   sunit=0 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>
>> anything missing?
>>
>
> What's the workload to the fs? Is it repeated rsync's from a constantly
> changing dataset? Do the files change frequently or are they only ever
> added/removed?

Yes it repeated rsync with constant changing files. About 10-20% of all 
files every week. A mixture of changing, removing / adding.

> Also, what is the characterization of writes being "slow?" An rsync is
> slower than normal? Sustained writes to a single file? How significant a
> degradation?

kworker is using all cpu while writing data to this xfs partition. rsync 
can just write at a rate of 32-128kb/s.

> Something like the following might be interesting as well:
> for i in $(seq 0 20); do xfs_db -c "agi $i" -c "p freecount" <dev>; done
freecount = 3189417
freecount = 1975726
freecount = 1309903
freecount = 1726846
freecount = 1271047
freecount = 1281956
freecount = 1571285
freecount = 1365473
freecount = 1238118
freecount = 1697011
freecount = 1000832
freecount = 1369791
freecount = 1706360
freecount = 1439165
freecount = 1656404
freecount = 1881762
freecount = 1593432
freecount = 1555909
freecount = 1197091
freecount = 1667467
freecount = 63

Thanks!

Stefan



> Brian
>
>>> ... as well as what your typical workflow/dataset is for this fs. It
>>> seems like you have relatively small files (15TB used across 350m files
>>> is around 46k per file), yes?
>>
>> Yes - most fo them are even smaller. And some files are > 5GB.
>>
>>> If so, I wonder if something like the
>>> following commit introduced in 3.12 would help:
>>>
>>> 133eeb17 xfs: don't use speculative prealloc for small files
>>
>> Looks interesting.
>>
>> Stefan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-05 18:07       ` Stefan Priebe
@ 2014-09-05 19:18         ` Brian Foster
  2014-09-05 20:14           ` Stefan Priebe
  0 siblings, 1 reply; 18+ messages in thread
From: Brian Foster @ 2014-09-05 19:18 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: xfs

On Fri, Sep 05, 2014 at 08:07:38PM +0200, Stefan Priebe wrote:
> Hi,
> 
> Am 05.09.2014 15:48, schrieb Brian Foster:
> >On Fri, Sep 05, 2014 at 02:40:32PM +0200, Stefan Priebe - Profihost AG wrote:
> >>
> >>Am 05.09.2014 um 14:30 schrieb Brian Foster:
> >>>On Fri, Sep 05, 2014 at 11:47:29AM +0200, Stefan Priebe - Profihost AG wrote:
> >>>>Hi,
> >>>>
> >>>>i have a backup system running 20TB of storage having 350 million files.
> >>>>This was working fine for month.
> >>>>
> >>>>But now the free space is so heavily fragmented that i only see the
> >>>>kworker with 4x 100% CPU and write speed beeing very slow. 15TB of the
> >>>>20TB are in use.
> >>>>
> >>>>Overall files are 350 Million - all in different directories. Max 5000
> >>>>per dir.
> >>>>
> >>>>Kernel is 3.10.53 and mount options are:
> >>>>noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota
> >>>>
> >>>># xfs_db -r -c freesp /dev/sda1
> >>>>    from      to extents  blocks    pct
> >>>>       1       1 29484138 29484138   2,16
> >>>>       2       3 16930134 39834672   2,92
> >>>>       4       7 16169985 87877159   6,45
> >>>>       8      15 78202543 999838327  73,41
> >>>>      16      31 3562456 83746085   6,15
> >>>>      32      63 2370812 102124143   7,50
> >>>>      64     127  280885 18929867   1,39
> >>>>     256     511       2     827   0,00
> >>>>     512    1023      65   35092   0,00
> >>>>    2048    4095       2    6561   0,00
> >>>>   16384   32767       1   23951   0,00
> >>>>
> >>>>Is there anything i can optimize? Or is it just a bad idea to do this
> >>>>with XFS? Any other options? Maybe rsync options like --inplace /
> >>>>--no-whole-file?
> >>>>
> >>>
> >>>It's probably a good idea to include more information about your fs:
> >>>
> >>>http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> >>
> >>Generally sure but the problem itself is clear. If you look at the free
> >>space allocation you see that free space is heavily fragmented.
> >>
> >>But here you go:
> >>- 3.10.53 vanilla
> >>- xfs_repair version 3.1.11
> >>- 16 cores
> >>- /dev/sda1 /backup xfs
> >>rw,noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota 0 0
> >>- Raid 10 with 1GB controller cache running in write back mode using 24
> >>spinners
> >>- no lvm
> >>- no io waits
> >>- xfs_info /serverbackup/
> >>meta-data=/dev/sda1              isize=256    agcount=21,
> >>agsize=268435455 blks
> >>          =                       sectsz=512   attr=2
> >>data     =                       bsize=4096   blocks=5369232896, imaxpct=5
> >>          =                       sunit=0      swidth=0 blks
> >>naming   =version 2              bsize=4096   ascii-ci=0
> >>log      =internal               bsize=4096   blocks=521728, version=2
> >>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> >>realtime =none                   extsz=4096   blocks=0, rtextents=0
> >>
> >>anything missing?
> >>
> >
> >What's the workload to the fs? Is it repeated rsync's from a constantly
> >changing dataset? Do the files change frequently or are they only ever
> >added/removed?
> 
> Yes it repeated rsync with constant changing files. About 10-20% of all
> files every week. A mixture of changing, removing / adding.
> 

Ok.

> >Also, what is the characterization of writes being "slow?" An rsync is
> >slower than normal? Sustained writes to a single file? How significant a
> >degradation?
> 
> kworker is using all cpu while writing data to this xfs partition. rsync can
> just write at a rate of 32-128kb/s.
> 

Do you have a baseline? This seems highly subjective. By that I mean
this could be slower for copying a lot of little files, faster if you
happen to copy a single large file, etc.

> >Something like the following might be interesting as well:
> >for i in $(seq 0 20); do xfs_db -c "agi $i" -c "p freecount" <dev>; done
> freecount = 3189417
> freecount = 1975726
> freecount = 1309903
> freecount = 1726846
> freecount = 1271047
> freecount = 1281956
> freecount = 1571285
> freecount = 1365473
> freecount = 1238118
> freecount = 1697011
> freecount = 1000832
> freecount = 1369791
> freecount = 1706360
> freecount = 1439165
> freecount = 1656404
> freecount = 1881762
> freecount = 1593432
> freecount = 1555909
> freecount = 1197091
> freecount = 1667467
> freecount = 63
> 

Interesting, that seems like a lot of free inodes. That's 1-2 million in
each AG that we have to look around for each time we want to allocate an
inode. I can't say for sure that's the source of the slowdown, but this
certainly looks like the kind of workload that inspired the addition of
the free inode btree (finobt) to more recent kernels.

It appears that you still have quite a bit of space available in
general. Could you run some local tests on this filesystem to try and
quantify how much of this degradation manifests on sustained writes vs.
file creation? For example, how is throughput when writing a few GB to a
local test file? How about with that same amount of data broken up
across a few thousand files?

Brian

P.S., Alternatively if you wanted to grab a metadump of this filesystem
and compress/upload it somewhere, I'd be interested to take a look at
it.

> Thanks!
> 
> Stefan
> 
> 
> 
> >Brian
> >
> >>>... as well as what your typical workflow/dataset is for this fs. It
> >>>seems like you have relatively small files (15TB used across 350m files
> >>>is around 46k per file), yes?
> >>
> >>Yes - most fo them are even smaller. And some files are > 5GB.
> >>
> >>>If so, I wonder if something like the
> >>>following commit introduced in 3.12 would help:
> >>>
> >>>133eeb17 xfs: don't use speculative prealloc for small files
> >>
> >>Looks interesting.
> >>
> >>Stefan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-05 19:18         ` Brian Foster
@ 2014-09-05 20:14           ` Stefan Priebe
  2014-09-05 21:24             ` Brian Foster
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Priebe @ 2014-09-05 20:14 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs


Am 05.09.2014 21:18, schrieb Brian Foster:
...

> On Fri, Sep 05, 2014 at 08:07:38PM +0200, Stefan Priebe wrote:
> Interesting, that seems like a lot of free inodes. That's 1-2 million in
> each AG that we have to look around for each time we want to allocate an
> inode. I can't say for sure that's the source of the slowdown, but this
> certainly looks like the kind of workload that inspired the addition of
> the free inode btree (finobt) to more recent kernels.
>
> It appears that you still have quite a bit of space available in
> general. Could you run some local tests on this filesystem to try and
> quantify how much of this degradation manifests on sustained writes vs.
> file creation? For example, how is throughput when writing a few GB to a
> local test file?

Not sure if this is what you expect:

# dd if=/dev/zero of=bigfile oflag=direct,sync bs=4M count=1000
1000+0 records in
1000+0 records out
4194304000 bytes (4,2 GB) copied, 125,809 s, 33,3 MB/s

or without sync
# dd if=/dev/zero of=bigfile oflag=direct bs=4M count=1000
1000+0 records in
1000+0 records out
4194304000 bytes (4,2 GB) copied, 32,5474 s, 129 MB/s

 > How about with that same amount of data broken up
> across a few thousand files?

This results in heavy kworker usage.

4GB in 32kb files
# time (mkdir test; for i in $(seq 1 1 131072); do dd if=/dev/zero 
of=test/$i bs=32k count=1 oflag=direct,sync 2>/dev/null; done)

...

55 min

> Brian
>
> P.S., Alternatively if you wanted to grab a metadump of this filesystem
> and compress/upload it somewhere, I'd be interested to take a look at
> it.

I think there might be file and directory names in it. If this is the 
case i can't do it.

Stefan


>
>> Thanks!
>>
>> Stefan
>>
>>
>>
>>> Brian
>>>
>>>>> ... as well as what your typical workflow/dataset is for this fs. It
>>>>> seems like you have relatively small files (15TB used across 350m files
>>>>> is around 46k per file), yes?
>>>>
>>>> Yes - most fo them are even smaller. And some files are > 5GB.
>>>>
>>>>> If so, I wonder if something like the
>>>>> following commit introduced in 3.12 would help:
>>>>>
>>>>> 133eeb17 xfs: don't use speculative prealloc for small files
>>>>
>>>> Looks interesting.
>>>>
>>>> Stefan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-05 20:14           ` Stefan Priebe
@ 2014-09-05 21:24             ` Brian Foster
  2014-09-05 22:39               ` Sean Caron
  0 siblings, 1 reply; 18+ messages in thread
From: Brian Foster @ 2014-09-05 21:24 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: xfs

On Fri, Sep 05, 2014 at 10:14:51PM +0200, Stefan Priebe wrote:
> 
> Am 05.09.2014 21:18, schrieb Brian Foster:
> ...
> 
> >On Fri, Sep 05, 2014 at 08:07:38PM +0200, Stefan Priebe wrote:
> >Interesting, that seems like a lot of free inodes. That's 1-2 million in
> >each AG that we have to look around for each time we want to allocate an
> >inode. I can't say for sure that's the source of the slowdown, but this
> >certainly looks like the kind of workload that inspired the addition of
> >the free inode btree (finobt) to more recent kernels.
> >
> >It appears that you still have quite a bit of space available in
> >general. Could you run some local tests on this filesystem to try and
> >quantify how much of this degradation manifests on sustained writes vs.
> >file creation? For example, how is throughput when writing a few GB to a
> >local test file?
> 
> Not sure if this is what you expect:
> 
> # dd if=/dev/zero of=bigfile oflag=direct,sync bs=4M count=1000
> 1000+0 records in
> 1000+0 records out
> 4194304000 bytes (4,2 GB) copied, 125,809 s, 33,3 MB/s
> 
> or without sync
> # dd if=/dev/zero of=bigfile oflag=direct bs=4M count=1000
> 1000+0 records in
> 1000+0 records out
> 4194304000 bytes (4,2 GB) copied, 32,5474 s, 129 MB/s
> 
> > How about with that same amount of data broken up
> >across a few thousand files?
> 
> This results in heavy kworker usage.
> 
> 4GB in 32kb files
> # time (mkdir test; for i in $(seq 1 1 131072); do dd if=/dev/zero
> of=test/$i bs=32k count=1 oflag=direct,sync 2>/dev/null; done)
> 
> ...
> 
> 55 min
> 

Both seem pretty slow in general. Any way you can establish a baseline
for these tests on this storage? If not, the only other suggestion I
could make is to allocate inodes until all of those freecount numbers
are accounted for and see if anything changes. That could certainly take
some time and it's not clear it will actually help.

> >Brian
> >
> >P.S., Alternatively if you wanted to grab a metadump of this filesystem
> >and compress/upload it somewhere, I'd be interested to take a look at
> >it.
> 
> I think there might be file and directory names in it. If this is the case i
> can't do it.
> 

It should enable obfuscation by default, but I would suggest to restore
it yourself and verify it meets your expectations.

Brian

> Stefan
> 
> 
> >
> >>Thanks!
> >>
> >>Stefan
> >>
> >>
> >>
> >>>Brian
> >>>
> >>>>>... as well as what your typical workflow/dataset is for this fs. It
> >>>>>seems like you have relatively small files (15TB used across 350m files
> >>>>>is around 46k per file), yes?
> >>>>
> >>>>Yes - most fo them are even smaller. And some files are > 5GB.
> >>>>
> >>>>>If so, I wonder if something like the
> >>>>>following commit introduced in 3.12 would help:
> >>>>>
> >>>>>133eeb17 xfs: don't use speculative prealloc for small files
> >>>>
> >>>>Looks interesting.
> >>>>
> >>>>Stefan
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-05 21:24             ` Brian Foster
@ 2014-09-05 22:39               ` Sean Caron
  0 siblings, 0 replies; 18+ messages in thread
From: Sean Caron @ 2014-09-05 22:39 UTC (permalink / raw)
  To: Brian Foster, Sean Caron; +Cc: xfs, Stefan Priebe


[-- Attachment #1.1: Type: text/plain, Size: 7234 bytes --]

Hi Stefan,

Generally speaking, this is a situation that you want to avoid. At 350
million files and 20 TB, you're looking at around 17-18 MB per file at
minimum? That's pretty small. And with 350M files, a fair number of those
350M must be on the smaller side of things.

Memory is cheap these days... people can make 50 GB, 100 GB, ... files, go
ahead and read those things directly into memory, 100%. And CPU cycles are
pretty cheap, too. Certainly you get more bang per buck there, than in IOPS
on your storage system!!

Empirically, I have found from experience (currently running Linux 3.4.61;
many historical revs previous) in a reasonably large-scale (up to ~0.5 PB
in single file system, up to 270 JBOD spindles on one machine) high-I/O
(jobs running on a few-hundred-node compute cluster, or a few hundred
threads running locally on the server) environment, that XFS (and things
running on top of it, ESPECIALLY rsync) will perform MUCH better on smaller
numbers of very large files, then they will on very large numbers of small
files (I'm always trying to reinforce this to our end users).

I'm not really even saying XFS is to really blame here... in fact in 3.4.61
it has been very well-behaved; but Linux has many warts: poor
implementation of I/O and CPU scheduling algorithms; kernel does not
degrade gracefully in resource-constrained settings; if you are ultimately
using this data store as a file share, the protocol implementations have
their own issues... NFS, CIFS, etc... Not trying to dog all the hardworking
free software devs out there but clearly much work remains to be done in
many areas, to make Linux really ready to play in the "big big leagues" of
computing (unless you have a local staff of good systems programmers with
some free time on their hands...). XFS is just one piece of the puzzle we
have to work with in trying to integrate a Linux system as a good
high-throughput storage machine.

If there is any way that you can use simple catenation or some kind of
archiver... even things as simple as shar, tar, zip... to get the file
sizes up and the absolute number of files down, you should notice some big
performance gains when trying to process your 20 TB worth of stuff.

If you can't dramatically increase individual file size while dramatically
reducing the absolute number of files for whatever reason in your
environment, I think you can still win by trying to reduce the number of
files in any one directory. You want to look out for directories that have
five or six figures worth of files in them, those can be real performance
killers. If your claim of no more than 5,000 files per any directory is
accurate, that shouldn't be a big deal for XFS at all, I don't think you're
in bad shape there.

Rsync can be just the worst in this kind of scenario. It runs so slow, you
feel sometimes like you might as well be on 10 Meg Ethernet (or worse).

I'm not sure exactly what your application is here... It sounds backup
related. If you're doing rsync, you can win a little bit by dropping down a
level or two in your directory hierarchy from the top of the tree where XFS
is mounted, and running a number of rsync threads in parallel,
per-directory, instead of just one top-level rsync thread for an entire
filesystem. Experiment to find the best number of threads; run too many and
they can deadlock, or just step all over one another.

Also, I have a suspicion (sorry can't back this up quantitatively) that if
you are just trying to a straight copy from here to there, a 'cp -Rp' will
be faster than an rsync. You might be better off doing an initial copy with
'cp -Rp' and then just synchronizing diffs at the end with an rsync pass,
rather than trying to do the whole thing with rsync.

Hope some of this might help... just casual thoughts from a daily
XFS-wrangler ;)

Best,

Sean




On Fri, Sep 5, 2014 at 5:24 PM, Brian Foster <bfoster@redhat.com> wrote:

> On Fri, Sep 05, 2014 at 10:14:51PM +0200, Stefan Priebe wrote:
> >
> > Am 05.09.2014 21:18, schrieb Brian Foster:
> > ...
> >
> > >On Fri, Sep 05, 2014 at 08:07:38PM +0200, Stefan Priebe wrote:
> > >Interesting, that seems like a lot of free inodes. That's 1-2 million in
> > >each AG that we have to look around for each time we want to allocate an
> > >inode. I can't say for sure that's the source of the slowdown, but this
> > >certainly looks like the kind of workload that inspired the addition of
> > >the free inode btree (finobt) to more recent kernels.
> > >
> > >It appears that you still have quite a bit of space available in
> > >general. Could you run some local tests on this filesystem to try and
> > >quantify how much of this degradation manifests on sustained writes vs.
> > >file creation? For example, how is throughput when writing a few GB to a
> > >local test file?
> >
> > Not sure if this is what you expect:
> >
> > # dd if=/dev/zero of=bigfile oflag=direct,sync bs=4M count=1000
> > 1000+0 records in
> > 1000+0 records out
> > 4194304000 bytes (4,2 GB) copied, 125,809 s, 33,3 MB/s
> >
> > or without sync
> > # dd if=/dev/zero of=bigfile oflag=direct bs=4M count=1000
> > 1000+0 records in
> > 1000+0 records out
> > 4194304000 bytes (4,2 GB) copied, 32,5474 s, 129 MB/s
> >
> > > How about with that same amount of data broken up
> > >across a few thousand files?
> >
> > This results in heavy kworker usage.
> >
> > 4GB in 32kb files
> > # time (mkdir test; for i in $(seq 1 1 131072); do dd if=/dev/zero
> > of=test/$i bs=32k count=1 oflag=direct,sync 2>/dev/null; done)
> >
> > ...
> >
> > 55 min
> >
>
> Both seem pretty slow in general. Any way you can establish a baseline
> for these tests on this storage? If not, the only other suggestion I
> could make is to allocate inodes until all of those freecount numbers
> are accounted for and see if anything changes. That could certainly take
> some time and it's not clear it will actually help.
>
> > >Brian
> > >
> > >P.S., Alternatively if you wanted to grab a metadump of this filesystem
> > >and compress/upload it somewhere, I'd be interested to take a look at
> > >it.
> >
> > I think there might be file and directory names in it. If this is the
> case i
> > can't do it.
> >
>
> It should enable obfuscation by default, but I would suggest to restore
> it yourself and verify it meets your expectations.
>
> Brian
>
> > Stefan
> >
> >
> > >
> > >>Thanks!
> > >>
> > >>Stefan
> > >>
> > >>
> > >>
> > >>>Brian
> > >>>
> > >>>>>... as well as what your typical workflow/dataset is for this fs. It
> > >>>>>seems like you have relatively small files (15TB used across 350m
> files
> > >>>>>is around 46k per file), yes?
> > >>>>
> > >>>>Yes - most fo them are even smaller. And some files are > 5GB.
> > >>>>
> > >>>>>If so, I wonder if something like the
> > >>>>>following commit introduced in 3.12 would help:
> > >>>>>
> > >>>>>133eeb17 xfs: don't use speculative prealloc for small files
> > >>>>
> > >>>>Looks interesting.
> > >>>>
> > >>>>Stefan
> >
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>

[-- Attachment #1.2: Type: text/html, Size: 9197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-05 12:40   ` Stefan Priebe - Profihost AG
  2014-09-05 13:48     ` Brian Foster
@ 2014-09-05 23:05     ` Dave Chinner
  2014-09-06  7:35       ` Stefan Priebe
  2014-09-06 14:51       ` Brian Foster
  1 sibling, 2 replies; 18+ messages in thread
From: Dave Chinner @ 2014-09-05 23:05 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Brian Foster, xfs

On Fri, Sep 05, 2014 at 02:40:32PM +0200, Stefan Priebe - Profihost AG wrote:
> 
> Am 05.09.2014 um 14:30 schrieb Brian Foster:
> > On Fri, Sep 05, 2014 at 11:47:29AM +0200, Stefan Priebe - Profihost AG wrote:
> >> Hi,
> >>
> >> i have a backup system running 20TB of storage having 350 million files.
> >> This was working fine for month.
> >>
> >> But now the free space is so heavily fragmented that i only see the
> >> kworker with 4x 100% CPU and write speed beeing very slow. 15TB of the
> >> 20TB are in use.

What does perf tell you about the CPU being burnt? (i.e run perf top
for 10-20s while that CPU burn is happening and paste the top 10 CPU
consuming functions).

> >>
> >> Overall files are 350 Million - all in different directories. Max 5000
> >> per dir.
> >>
> >> Kernel is 3.10.53 and mount options are:
> >> noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota
> >>
> >> # xfs_db -r -c freesp /dev/sda1
> >>    from      to extents  blocks    pct
> >>       1       1 29484138 29484138   2,16
> >>       2       3 16930134 39834672   2,92
> >>       4       7 16169985 87877159   6,45
> >>       8      15 78202543 999838327  73,41

With an inode size of 256 bytes, this is going to be your real
problem soon - most of the free space is smaller than an inode
chunk so soon you won't be able to allocate new inodes, even though
there is free space on disk.

Unfortunately, there's not much we can do about this right now - we
need development in both user and kernel space to mitigate this
issue: spare inode chunk allocation in kernel space, and free space
defragmentation in userspace. Both are on the near term development
list....

Also, the fact that there are almost 80 million 8-15 block extents
indicates that the CPU burn is likely coming from the by-size free
space search. We look up the first extent of the correct size, and
then do a linear search for a nearest extent of that size to the
target. Hence we could be searching millions of extents to find the
"nearest"....

> >>      16      31 3562456 83746085   6,15
> >>      32      63 2370812 102124143   7,50
> >>      64     127  280885 18929867   1,39
> >>     256     511       2     827   0,00
> >>     512    1023      65   35092   0,00
> >>    2048    4095       2    6561   0,00
> >>   16384   32767       1   23951   0,00
> >>
> >> Is there anything i can optimize? Or is it just a bad idea to do this
> >> with XFS?

No, it's not a bad idea. In fact, if you have this sort of use case,
XFS is really your only choice. In terms of optimisation, the only
thing that will really help performance is the new finobt structure.
That's a mkfs option andnot an in-place change, though, so it's
unlikely to help.

FWIW, it may also help aging characteristics of this sort of
workload by improving inode allocation layout. That would be 
a side effect of being able to search the entire free inode tree
extremely quickly rather than allocating new chunks to keep CPU time
searching the allocate inode tree for free inodes down. Hence it
would tend to more tightly pack inode chunks when they are allocated
on disk as it will fill full chunks before allocating new ones
elsewhere.

> >> Any other options? Maybe rsync options like --inplace /
> >> --no-whole-file?

For 350M files? I doubt there's much you can really do. Any sort of
large scale re-organisation is going to take a long, long time and
require lots of IO. If you are goign to take that route, you'd do
better to upgrade kernel and xfsprogs, then dump/mkfs.xfs -m
crc=1,finobt=1/restore. And you'd probably want to use a
multi-stream dump/restore so it can run operations concurrently and
hence at storage speed rather than being CPU bound....

Also, if the problem really is the number of indentically sized free
space fragments in the freespace btrees, then the initial solution
is, again, a mkfs one. i.e. remake the filesystem with more, smaller
AGs to keep the number of extents the btrees need to index down to a
reasonable level. Say a couple of hundred AGs rather than 21?

> > If so, I wonder if something like the
> > following commit introduced in 3.12 would help:
> > 
> > 133eeb17 xfs: don't use speculative prealloc for small files
> 
> Looks interesting.

Probably won't make any difference because backups via rsync do
open/write/close and don't touch the file data again, so the close
will be removing speculative preallocation before the data is
written and extents are allocated by background writeback....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-05 23:05     ` Dave Chinner
@ 2014-09-06  7:35       ` Stefan Priebe
  2014-09-06 15:04         ` Brian Foster
  2014-09-06 14:51       ` Brian Foster
  1 sibling, 1 reply; 18+ messages in thread
From: Stefan Priebe @ 2014-09-06  7:35 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, xfs

Hi Dave,

Am 06.09.2014 01:05, schrieb Dave Chinner:
> On Fri, Sep 05, 2014 at 02:40:32PM +0200, Stefan Priebe - Profihost AG wrote:
>>
>> Am 05.09.2014 um 14:30 schrieb Brian Foster:
>>> On Fri, Sep 05, 2014 at 11:47:29AM +0200, Stefan Priebe - Profihost AG wrote:
>>>> Hi,
>>>>
>>>> i have a backup system running 20TB of storage having 350 million files.
>>>> This was working fine for month.
>>>>
>>>> But now the free space is so heavily fragmented that i only see the
>>>> kworker with 4x 100% CPU and write speed beeing very slow. 15TB of the
>>>> 20TB are in use.
>
> What does perf tell you about the CPU being burnt? (i.e run perf top
> for 10-20s while that CPU burn is happening and paste the top 10 CPU
> consuming functions).

here we go:
  15,79%  [kernel]            [k] xfs_inobt_get_rec
  14,57%  [kernel]            [k] xfs_btree_get_rec
  10,37%  [kernel]            [k] xfs_btree_increment
   7,20%  [kernel]            [k] xfs_btree_get_block
   6,13%  [kernel]            [k] xfs_btree_rec_offset
   4,90%  [kernel]            [k] xfs_dialloc_ag
   3,53%  [kernel]            [k] xfs_btree_readahead
   2,87%  [kernel]            [k] xfs_btree_rec_addr
   2,80%  [kernel]            [k] _xfs_buf_find
   1,94%  [kernel]            [k] intel_idle
   1,49%  [kernel]            [k] _raw_spin_lock
   1,13%  [kernel]            [k] copy_pte_range
   1,10%  [kernel]            [k] unmap_single_vma

>>>>
>>>> Overall files are 350 Million - all in different directories. Max 5000
>>>> per dir.
>>>>
>>>> Kernel is 3.10.53 and mount options are:
>>>> noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota
>>>>
>>>> # xfs_db -r -c freesp /dev/sda1
>>>>     from      to extents  blocks    pct
>>>>        1       1 29484138 29484138   2,16
>>>>        2       3 16930134 39834672   2,92
>>>>        4       7 16169985 87877159   6,45
>>>>        8      15 78202543 999838327  73,41
>
> With an inode size of 256 bytes, this is going to be your real
> problem soon - most of the free space is smaller than an inode
> chunk so soon you won't be able to allocate new inodes, even though
> there is free space on disk.
>
> Unfortunately, there's not much we can do about this right now - we
> need development in both user and kernel space to mitigate this
> issue: spare inode chunk allocation in kernel space, and free space
> defragmentation in userspace. Both are on the near term development
> list....
>
> Also, the fact that there are almost 80 million 8-15 block extents
> indicates that the CPU burn is likely coming from the by-size free
> space search. We look up the first extent of the correct size, and
> then do a linear search for a nearest extent of that size to the
> target. Hence we could be searching millions of extents to find the
> "nearest"....
>
>>>>       16      31 3562456 83746085   6,15
>>>>       32      63 2370812 102124143   7,50
>>>>       64     127  280885 18929867   1,39
>>>>      256     511       2     827   0,00
>>>>      512    1023      65   35092   0,00
>>>>     2048    4095       2    6561   0,00
>>>>    16384   32767       1   23951   0,00
>>>>
>>>> Is there anything i can optimize? Or is it just a bad idea to do this
>>>> with XFS?
>
> No, it's not a bad idea. In fact, if you have this sort of use case,
> XFS is really your only choice. In terms of optimisation, the only
> thing that will really help performance is the new finobt structure.
> That's a mkfs option andnot an in-place change, though, so it's
> unlikely to help.

I've no problem with reformatting the array. I've more backups.

> FWIW, it may also help aging characteristics of this sort of
> workload by improving inode allocation layout. That would be
> a side effect of being able to search the entire free inode tree
> extremely quickly rather than allocating new chunks to keep CPU time
> searching the allocate inode tree for free inodes down. Hence it
> would tend to more tightly pack inode chunks when they are allocated
> on disk as it will fill full chunks before allocating new ones
> elsewhere.
>
>>>> Any other options? Maybe rsync options like --inplace /
>>>> --no-whole-file?
>
> For 350M files? I doubt there's much you can really do. Any sort of
> large scale re-organisation is going to take a long, long time and
> require lots of IO. If you are goign to take that route, you'd do
> better to upgrade kernel and xfsprogs, then dump/mkfs.xfs -m
> crc=1,finobt=1/restore. And you'd probably want to use a
> multi-stream dump/restore so it can run operations concurrently and
> hence at storage speed rather than being CPU bound....

I don't need a backup reformatting is possible but i really would like 
to stay at 3.10. Is there anything i can backport or do i really need to 
upgrade? Which version at least?

> Also, if the problem really is the number of indentically sized free
> space fragments in the freespace btrees, then the initial solution
> is, again, a mkfs one. i.e. remake the filesystem with more, smaller
> AGs to keep the number of extents the btrees need to index down to a
> reasonable level. Say a couple of hundred AGs rather than 21?

mkfs has chosen 21 automagically - it's nothing i've set. Is this a bug 
or do i just need it cause of my special use case.

Thanks!

Stefan

>>> If so, I wonder if something like the
>>> following commit introduced in 3.12 would help:
>>>
>>> 133eeb17 xfs: don't use speculative prealloc for small files
>>
>> Looks interesting.
>
> Probably won't make any difference because backups via rsync do
> open/write/close and don't touch the file data again, so the close
> will be removing speculative preallocation before the data is
> written and extents are allocated by background writeback....
>
> Cheers,
>
> Dave.
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-05 23:05     ` Dave Chinner
  2014-09-06  7:35       ` Stefan Priebe
@ 2014-09-06 14:51       ` Brian Foster
  2014-09-06 22:54         ` Dave Chinner
  1 sibling, 1 reply; 18+ messages in thread
From: Brian Foster @ 2014-09-06 14:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, Stefan Priebe - Profihost AG

On Sat, Sep 06, 2014 at 09:05:28AM +1000, Dave Chinner wrote:
> On Fri, Sep 05, 2014 at 02:40:32PM +0200, Stefan Priebe - Profihost AG wrote:
> > 
> > Am 05.09.2014 um 14:30 schrieb Brian Foster:
> > > On Fri, Sep 05, 2014 at 11:47:29AM +0200, Stefan Priebe - Profihost AG wrote:
> > >> Hi,
> > >>
> > >> i have a backup system running 20TB of storage having 350 million files.
> > >> This was working fine for month.
> > >>
> > >> But now the free space is so heavily fragmented that i only see the
> > >> kworker with 4x 100% CPU and write speed beeing very slow. 15TB of the
> > >> 20TB are in use.
> 
> What does perf tell you about the CPU being burnt? (i.e run perf top
> for 10-20s while that CPU burn is happening and paste the top 10 CPU
> consuming functions).
> 
> > >>
> > >> Overall files are 350 Million - all in different directories. Max 5000
> > >> per dir.
> > >>
> > >> Kernel is 3.10.53 and mount options are:
> > >> noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota
> > >>
> > >> # xfs_db -r -c freesp /dev/sda1
> > >>    from      to extents  blocks    pct
> > >>       1       1 29484138 29484138   2,16
> > >>       2       3 16930134 39834672   2,92
> > >>       4       7 16169985 87877159   6,45
> > >>       8      15 78202543 999838327  73,41
> 
> With an inode size of 256 bytes, this is going to be your real
> problem soon - most of the free space is smaller than an inode
> chunk so soon you won't be able to allocate new inodes, even though
> there is free space on disk.
> 

The extent list here is in fsb units, right? 256b inodes means 16k inode
chunks, in which case it seems like there's still plenty of room for
inode chunks (e.g., 8-15 blocks -> 32k-64k).

If you're at 350m inodes for 15T with 5T to go, that's 23.3m inodes per
TB and extrapolates to ~117m more to enospc. That's 1.8m inode chunks
out of the ~80m 8-15 block records currently free, and doesn't count the
20+ million inodes that seem to be scattered about the existing records
as well.

I certainly could be missing something here, but it seems like premature
enospc due to inode chunk allocation failure might not be an impending
problem here (likely due to using the smallest inode size, the risk
seems to increase much more using the larger inode sizes)...

> Unfortunately, there's not much we can do about this right now - we
> need development in both user and kernel space to mitigate this
> issue: spare inode chunk allocation in kernel space, and free space
> defragmentation in userspace. Both are on the near term development
> list....
> 
> Also, the fact that there are almost 80 million 8-15 block extents
> indicates that the CPU burn is likely coming from the by-size free
> space search. We look up the first extent of the correct size, and
> then do a linear search for a nearest extent of that size to the
> target. Hence we could be searching millions of extents to find the
> "nearest"....
> 
> > >>      16      31 3562456 83746085   6,15
> > >>      32      63 2370812 102124143   7,50
> > >>      64     127  280885 18929867   1,39
> > >>     256     511       2     827   0,00
> > >>     512    1023      65   35092   0,00
> > >>    2048    4095       2    6561   0,00
> > >>   16384   32767       1   23951   0,00
> > >>
> > >> Is there anything i can optimize? Or is it just a bad idea to do this
> > >> with XFS?
> 
> No, it's not a bad idea. In fact, if you have this sort of use case,
> XFS is really your only choice. In terms of optimisation, the only
> thing that will really help performance is the new finobt structure.
> That's a mkfs option andnot an in-place change, though, so it's
> unlikely to help.
> 
> FWIW, it may also help aging characteristics of this sort of
> workload by improving inode allocation layout. That would be 
> a side effect of being able to search the entire free inode tree
> extremely quickly rather than allocating new chunks to keep CPU time
> searching the allocate inode tree for free inodes down. Hence it
> would tend to more tightly pack inode chunks when they are allocated
> on disk as it will fill full chunks before allocating new ones
> elsewhere.
> 
> > >> Any other options? Maybe rsync options like --inplace /
> > >> --no-whole-file?
> 
> For 350M files? I doubt there's much you can really do. Any sort of
> large scale re-organisation is going to take a long, long time and
> require lots of IO. If you are goign to take that route, you'd do
> better to upgrade kernel and xfsprogs, then dump/mkfs.xfs -m
> crc=1,finobt=1/restore. And you'd probably want to use a
> multi-stream dump/restore so it can run operations concurrently and
> hence at storage speed rather than being CPU bound....
> 
> Also, if the problem really is the number of indentically sized free
> space fragments in the freespace btrees, then the initial solution
> is, again, a mkfs one. i.e. remake the filesystem with more, smaller
> AGs to keep the number of extents the btrees need to index down to a
> reasonable level. Say a couple of hundred AGs rather than 21?
> 
> > > If so, I wonder if something like the
> > > following commit introduced in 3.12 would help:
> > > 
> > > 133eeb17 xfs: don't use speculative prealloc for small files
> > 
> > Looks interesting.
> 
> Probably won't make any difference because backups via rsync do
> open/write/close and don't touch the file data again, so the close
> will be removing speculative preallocation before the data is
> written and extents are allocated by background writeback....
> 

Yeah, good point. I was curious if there was an fsync involved somewhere
in the sequence here, but I didn't see rsync doing that anywhere. I
think we've seen that contribute to the aforementioned inode chunk
allocation problem when mixed with aggressive prealloc, but that was a
different application (openstack related, iirc).

That said, Stefan did mention that rsync can do file updates here. So
perhaps there exists the possibility to see multiple file extensions and
writeback causing a similar kind of prealloc->convert->trim eofblocks
pattern across multiple backups..? Either way I agree that seems much
less likely to be a prominent contributer to the problem here.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-06  7:35       ` Stefan Priebe
@ 2014-09-06 15:04         ` Brian Foster
  2014-09-06 22:56           ` Dave Chinner
  0 siblings, 1 reply; 18+ messages in thread
From: Brian Foster @ 2014-09-06 15:04 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: xfs

On Sat, Sep 06, 2014 at 09:35:15AM +0200, Stefan Priebe wrote:
> Hi Dave,
> 
> Am 06.09.2014 01:05, schrieb Dave Chinner:
> >On Fri, Sep 05, 2014 at 02:40:32PM +0200, Stefan Priebe - Profihost AG wrote:
> >>
> >>Am 05.09.2014 um 14:30 schrieb Brian Foster:
> >>>On Fri, Sep 05, 2014 at 11:47:29AM +0200, Stefan Priebe - Profihost AG wrote:
> >>>>Hi,
> >>>>
> >>>>i have a backup system running 20TB of storage having 350 million files.
> >>>>This was working fine for month.
> >>>>
> >>>>But now the free space is so heavily fragmented that i only see the
> >>>>kworker with 4x 100% CPU and write speed beeing very slow. 15TB of the
> >>>>20TB are in use.
> >
> >What does perf tell you about the CPU being burnt? (i.e run perf top
> >for 10-20s while that CPU burn is happening and paste the top 10 CPU
> >consuming functions).
> 
> here we go:
>  15,79%  [kernel]            [k] xfs_inobt_get_rec
>  14,57%  [kernel]            [k] xfs_btree_get_rec
>  10,37%  [kernel]            [k] xfs_btree_increment
>   7,20%  [kernel]            [k] xfs_btree_get_block
>   6,13%  [kernel]            [k] xfs_btree_rec_offset
>   4,90%  [kernel]            [k] xfs_dialloc_ag
>   3,53%  [kernel]            [k] xfs_btree_readahead
>   2,87%  [kernel]            [k] xfs_btree_rec_addr
>   2,80%  [kernel]            [k] _xfs_buf_find
>   1,94%  [kernel]            [k] intel_idle
>   1,49%  [kernel]            [k] _raw_spin_lock
>   1,13%  [kernel]            [k] copy_pte_range
>   1,10%  [kernel]            [k] unmap_single_vma
> 

The top 6 or so items look related to inode allocation, so that probably
confirms the primary bottleneck as searching around for free inodes out
of the existing inode chunks, precisely what the finobt is intended to
resolve. That was introduced in 3.16 kernels, so unfortunately it is not
available in 3.10.

Brian

> >>>>
> >>>>Overall files are 350 Million - all in different directories. Max 5000
> >>>>per dir.
> >>>>
> >>>>Kernel is 3.10.53 and mount options are:
> >>>>noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota
> >>>>
> >>>># xfs_db -r -c freesp /dev/sda1
> >>>>    from      to extents  blocks    pct
> >>>>       1       1 29484138 29484138   2,16
> >>>>       2       3 16930134 39834672   2,92
> >>>>       4       7 16169985 87877159   6,45
> >>>>       8      15 78202543 999838327  73,41
> >
> >With an inode size of 256 bytes, this is going to be your real
> >problem soon - most of the free space is smaller than an inode
> >chunk so soon you won't be able to allocate new inodes, even though
> >there is free space on disk.
> >
> >Unfortunately, there's not much we can do about this right now - we
> >need development in both user and kernel space to mitigate this
> >issue: spare inode chunk allocation in kernel space, and free space
> >defragmentation in userspace. Both are on the near term development
> >list....
> >
> >Also, the fact that there are almost 80 million 8-15 block extents
> >indicates that the CPU burn is likely coming from the by-size free
> >space search. We look up the first extent of the correct size, and
> >then do a linear search for a nearest extent of that size to the
> >target. Hence we could be searching millions of extents to find the
> >"nearest"....
> >
> >>>>      16      31 3562456 83746085   6,15
> >>>>      32      63 2370812 102124143   7,50
> >>>>      64     127  280885 18929867   1,39
> >>>>     256     511       2     827   0,00
> >>>>     512    1023      65   35092   0,00
> >>>>    2048    4095       2    6561   0,00
> >>>>   16384   32767       1   23951   0,00
> >>>>
> >>>>Is there anything i can optimize? Or is it just a bad idea to do this
> >>>>with XFS?
> >
> >No, it's not a bad idea. In fact, if you have this sort of use case,
> >XFS is really your only choice. In terms of optimisation, the only
> >thing that will really help performance is the new finobt structure.
> >That's a mkfs option andnot an in-place change, though, so it's
> >unlikely to help.
> 
> I've no problem with reformatting the array. I've more backups.
> 
> >FWIW, it may also help aging characteristics of this sort of
> >workload by improving inode allocation layout. That would be
> >a side effect of being able to search the entire free inode tree
> >extremely quickly rather than allocating new chunks to keep CPU time
> >searching the allocate inode tree for free inodes down. Hence it
> >would tend to more tightly pack inode chunks when they are allocated
> >on disk as it will fill full chunks before allocating new ones
> >elsewhere.
> >
> >>>>Any other options? Maybe rsync options like --inplace /
> >>>>--no-whole-file?
> >
> >For 350M files? I doubt there's much you can really do. Any sort of
> >large scale re-organisation is going to take a long, long time and
> >require lots of IO. If you are goign to take that route, you'd do
> >better to upgrade kernel and xfsprogs, then dump/mkfs.xfs -m
> >crc=1,finobt=1/restore. And you'd probably want to use a
> >multi-stream dump/restore so it can run operations concurrently and
> >hence at storage speed rather than being CPU bound....
> 
> I don't need a backup reformatting is possible but i really would like to
> stay at 3.10. Is there anything i can backport or do i really need to
> upgrade? Which version at least?
> 
> >Also, if the problem really is the number of indentically sized free
> >space fragments in the freespace btrees, then the initial solution
> >is, again, a mkfs one. i.e. remake the filesystem with more, smaller
> >AGs to keep the number of extents the btrees need to index down to a
> >reasonable level. Say a couple of hundred AGs rather than 21?
> 
> mkfs has chosen 21 automagically - it's nothing i've set. Is this a bug or
> do i just need it cause of my special use case.
> 
> Thanks!
> 
> Stefan
> 
> >>>If so, I wonder if something like the
> >>>following commit introduced in 3.12 would help:
> >>>
> >>>133eeb17 xfs: don't use speculative prealloc for small files
> >>
> >>Looks interesting.
> >
> >Probably won't make any difference because backups via rsync do
> >open/write/close and don't touch the file data again, so the close
> >will be removing speculative preallocation before the data is
> >written and extents are allocated by background writeback....
> >
> >Cheers,
> >
> >Dave.
> >
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-06 14:51       ` Brian Foster
@ 2014-09-06 22:54         ` Dave Chinner
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2014-09-06 22:54 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs, Stefan Priebe - Profihost AG

On Sat, Sep 06, 2014 at 10:51:05AM -0400, Brian Foster wrote:
> On Sat, Sep 06, 2014 at 09:05:28AM +1000, Dave Chinner wrote:
> > On Fri, Sep 05, 2014 at 02:40:32PM +0200, Stefan Priebe - Profihost AG wrote:
> > > 
> > > Am 05.09.2014 um 14:30 schrieb Brian Foster:
> > > > On Fri, Sep 05, 2014 at 11:47:29AM +0200, Stefan Priebe - Profihost AG wrote:
> > > >> Hi,
> > > >>
> > > >> i have a backup system running 20TB of storage having 350 million files.
> > > >> This was working fine for month.
> > > >>
> > > >> But now the free space is so heavily fragmented that i only see the
> > > >> kworker with 4x 100% CPU and write speed beeing very slow. 15TB of the
> > > >> 20TB are in use.
> > 
> > What does perf tell you about the CPU being burnt? (i.e run perf top
> > for 10-20s while that CPU burn is happening and paste the top 10 CPU
> > consuming functions).
> > 
> > > >>
> > > >> Overall files are 350 Million - all in different directories. Max 5000
> > > >> per dir.
> > > >>
> > > >> Kernel is 3.10.53 and mount options are:
> > > >> noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota
> > > >>
> > > >> # xfs_db -r -c freesp /dev/sda1
> > > >>    from      to extents  blocks    pct
> > > >>       1       1 29484138 29484138   2,16
> > > >>       2       3 16930134 39834672   2,92
> > > >>       4       7 16169985 87877159   6,45
> > > >>       8      15 78202543 999838327  73,41
> > 
> > With an inode size of 256 bytes, this is going to be your real
> > problem soon - most of the free space is smaller than an inode
> > chunk so soon you won't be able to allocate new inodes, even though
> > there is free space on disk.
> > 
> 
> The extent list here is in fsb units, right? 256b inodes means 16k inode
> chunks, in which case it seems like there's still plenty of room for
> inode chunks (e.g., 8-15 blocks -> 32k-64k).

PEBKAC. My bad.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-06 15:04         ` Brian Foster
@ 2014-09-06 22:56           ` Dave Chinner
  2014-09-08  8:35             ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2014-09-06 22:56 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs, Stefan Priebe

On Sat, Sep 06, 2014 at 11:04:13AM -0400, Brian Foster wrote:
> On Sat, Sep 06, 2014 at 09:35:15AM +0200, Stefan Priebe wrote:
> > Hi Dave,
> > 
> > Am 06.09.2014 01:05, schrieb Dave Chinner:
> > >On Fri, Sep 05, 2014 at 02:40:32PM +0200, Stefan Priebe - Profihost AG wrote:
> > >>
> > >>Am 05.09.2014 um 14:30 schrieb Brian Foster:
> > >>>On Fri, Sep 05, 2014 at 11:47:29AM +0200, Stefan Priebe - Profihost AG wrote:
> > >>>>Hi,
> > >>>>
> > >>>>i have a backup system running 20TB of storage having 350 million files.
> > >>>>This was working fine for month.
> > >>>>
> > >>>>But now the free space is so heavily fragmented that i only see the
> > >>>>kworker with 4x 100% CPU and write speed beeing very slow. 15TB of the
> > >>>>20TB are in use.
> > >
> > >What does perf tell you about the CPU being burnt? (i.e run perf top
> > >for 10-20s while that CPU burn is happening and paste the top 10 CPU
> > >consuming functions).
> > 
> > here we go:
> >  15,79%  [kernel]            [k] xfs_inobt_get_rec
> >  14,57%  [kernel]            [k] xfs_btree_get_rec
> >  10,37%  [kernel]            [k] xfs_btree_increment
> >   7,20%  [kernel]            [k] xfs_btree_get_block
> >   6,13%  [kernel]            [k] xfs_btree_rec_offset
> >   4,90%  [kernel]            [k] xfs_dialloc_ag
> >   3,53%  [kernel]            [k] xfs_btree_readahead
> >   2,87%  [kernel]            [k] xfs_btree_rec_addr
> >   2,80%  [kernel]            [k] _xfs_buf_find
> >   1,94%  [kernel]            [k] intel_idle
> >   1,49%  [kernel]            [k] _raw_spin_lock
> >   1,13%  [kernel]            [k] copy_pte_range
> >   1,10%  [kernel]            [k] unmap_single_vma
> > 
> 
> The top 6 or so items look related to inode allocation, so that probably
> confirms the primary bottleneck as searching around for free inodes out
> of the existing inode chunks, precisely what the finobt is intended to
> resolve. That was introduced in 3.16 kernels, so unfortunately it is not
> available in 3.10.

*nod*

Again, the only workaround for this on a non-finobt fs is to greatly
increase the number of AGs so there's less records in each btree to
search.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-06 22:56           ` Dave Chinner
@ 2014-09-08  8:35             ` Stefan Priebe - Profihost AG
  2014-09-08  9:46               ` Dave Chinner
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Priebe - Profihost AG @ 2014-09-08  8:35 UTC (permalink / raw)
  To: Dave Chinner, Brian Foster; +Cc: xfs

Thanks,

upgraded to 3.16.2 and xfsprogs 3.2.1. Let's see how it behaves with finobt.

Greets,
Stefan


Am 07.09.2014 um 00:56 schrieb Dave Chinner:
> On Sat, Sep 06, 2014 at 11:04:13AM -0400, Brian Foster wrote:
>> On Sat, Sep 06, 2014 at 09:35:15AM +0200, Stefan Priebe wrote:
>>> Hi Dave,
>>>
>>> Am 06.09.2014 01:05, schrieb Dave Chinner:
>>>> On Fri, Sep 05, 2014 at 02:40:32PM +0200, Stefan Priebe - Profihost AG wrote:
>>>>>
>>>>> Am 05.09.2014 um 14:30 schrieb Brian Foster:
>>>>>> On Fri, Sep 05, 2014 at 11:47:29AM +0200, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> i have a backup system running 20TB of storage having 350 million files.
>>>>>>> This was working fine for month.
>>>>>>>
>>>>>>> But now the free space is so heavily fragmented that i only see the
>>>>>>> kworker with 4x 100% CPU and write speed beeing very slow. 15TB of the
>>>>>>> 20TB are in use.
>>>>
>>>> What does perf tell you about the CPU being burnt? (i.e run perf top
>>>> for 10-20s while that CPU burn is happening and paste the top 10 CPU
>>>> consuming functions).
>>>
>>> here we go:
>>>  15,79%  [kernel]            [k] xfs_inobt_get_rec
>>>  14,57%  [kernel]            [k] xfs_btree_get_rec
>>>  10,37%  [kernel]            [k] xfs_btree_increment
>>>   7,20%  [kernel]            [k] xfs_btree_get_block
>>>   6,13%  [kernel]            [k] xfs_btree_rec_offset
>>>   4,90%  [kernel]            [k] xfs_dialloc_ag
>>>   3,53%  [kernel]            [k] xfs_btree_readahead
>>>   2,87%  [kernel]            [k] xfs_btree_rec_addr
>>>   2,80%  [kernel]            [k] _xfs_buf_find
>>>   1,94%  [kernel]            [k] intel_idle
>>>   1,49%  [kernel]            [k] _raw_spin_lock
>>>   1,13%  [kernel]            [k] copy_pte_range
>>>   1,10%  [kernel]            [k] unmap_single_vma
>>>
>>
>> The top 6 or so items look related to inode allocation, so that probably
>> confirms the primary bottleneck as searching around for free inodes out
>> of the existing inode chunks, precisely what the finobt is intended to
>> resolve. That was introduced in 3.16 kernels, so unfortunately it is not
>> available in 3.10.
> 
> *nod*
> 
> Again, the only workaround for this on a non-finobt fs is to greatly
> increase the number of AGs so there's less records in each btree to
> search.
> 
> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-08  8:35             ` Stefan Priebe - Profihost AG
@ 2014-09-08  9:46               ` Dave Chinner
  2014-09-08  9:49                 ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2014-09-08  9:46 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Brian Foster, xfs

On Mon, Sep 08, 2014 at 10:35:56AM +0200, Stefan Priebe - Profihost AG wrote:
> Thanks,
> 
> upgraded to 3.16.2 and xfsprogs 3.2.1. Let's see how it behaves with finobt.

You need to mkfs the filesystem to use finobt - it's not something
that an upgrade will just switch on. i.e.

# mkfs.xfs -m crc=1,finobt=1 <dev>

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Is XFS suitable for 350 million files on 20TB storage?
  2014-09-08  9:46               ` Dave Chinner
@ 2014-09-08  9:49                 ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Priebe - Profihost AG @ 2014-09-08  9:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, xfs


Am 08.09.2014 um 11:46 schrieb Dave Chinner:
> On Mon, Sep 08, 2014 at 10:35:56AM +0200, Stefan Priebe - Profihost AG wrote:
>> Thanks,
>>
>> upgraded to 3.16.2 and xfsprogs 3.2.1. Let's see how it behaves with finobt.
> 
> You need to mkfs the filesystem to use finobt - it's not something
> that an upgrade will just switch on. i.e.
> 
> # mkfs.xfs -m crc=1,finobt=1 <dev>

Sure i also did a reformat using "-m crc=1,finobt=1".

Stefan

> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-09-08  9:49 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-05  9:47 Is XFS suitable for 350 million files on 20TB storage? Stefan Priebe - Profihost AG
2014-09-05 12:30 ` Brian Foster
2014-09-05 12:40   ` Stefan Priebe - Profihost AG
2014-09-05 13:48     ` Brian Foster
2014-09-05 18:07       ` Stefan Priebe
2014-09-05 19:18         ` Brian Foster
2014-09-05 20:14           ` Stefan Priebe
2014-09-05 21:24             ` Brian Foster
2014-09-05 22:39               ` Sean Caron
2014-09-05 23:05     ` Dave Chinner
2014-09-06  7:35       ` Stefan Priebe
2014-09-06 15:04         ` Brian Foster
2014-09-06 22:56           ` Dave Chinner
2014-09-08  8:35             ` Stefan Priebe - Profihost AG
2014-09-08  9:46               ` Dave Chinner
2014-09-08  9:49                 ` Stefan Priebe - Profihost AG
2014-09-06 14:51       ` Brian Foster
2014-09-06 22:54         ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.