All of lore.kernel.org
 help / color / mirror / Atom feed
* Slow file stat/deletion
@ 2016-11-25 10:40 Gionatan Danti
  2016-11-27 22:14 ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Gionatan Danti @ 2016-11-25 10:40 UTC (permalink / raw)
  To: linux-xfs; +Cc: Gionatan Danti

Hi all,
I am using a XFS filesystem as a target for rsnapshot hardlink-based 
backups.

Being hardlink-based, our backups generally are quite fast. However, I 
noticed that for some directories having many small files the longer 
part of the backup process is to remove the old (out-of-retention) 
subdirs that must be purged to make room for the new backup iteration.

Further analysis show that the slower part of the 'rm' process is the 
reading of the affected inodes/dentries. An example: to remove a subdir 
with ~700000 files and directories, the system needs about 30 minutes. 
At the same time, issuing a simple "find <dir> / | wc -l" (after having 
dropped the caches) need ~24 minutes. In other words, actual reads need 
4x the real delete time.

So, my question is: there is anything I can do to speedup the 
read/stat/deletion?

Here is my system config:
CPU: AMD Opteron(tm) Processor 4334
RAM: 16 GB
HDD: 12x 2TB WD RE in a RAID6 array (64k stripe unit), attached to a 
PERC H700 controller with 512MB BBU writeback cache
OS:  CentOS 7.2 x86_64 with 3.10.0-327.18.2.el7.x86_64 kernel

Relevant LVM setup:
LV           VG         Attr       LSize  Pool         Origin Data% 
Meta%  Move Log Cpy%Sync Convert Chunk
000-ThinPool vg_storage twi-aotz-- 10,85t                     86,71 
38,53                            8,00m
Storage      vg_storage Vwi-aotz-- 10,80t 000-ThinPool        87,12 
                                 0

XFS filesystem info:
meta-data=/dev/mapper/vg_storage-Storage isize=512    agcount=32, 
agsize=90596992 blks
          =                       sectsz=512   attr=2, projid32bit=1
          =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=2899103744, imaxpct=5
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=521728, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Some consideration:
1) I am using a single big thin volume because back in the time ( >2 
years ago) I was not sure about XFS and, having no shrinking capability, 
I relayed on thin volume unmap should the filesystem choice change. 
However, thin pool's chunk size is quite big (8 MB) so it should not 
pose acute fragmentation problem;

2) due to being layered over a thinly provided volume, the filesystem 
was created with "noalign" option. I run some in-the-lab test on a spare 
machine and I (still) find that this option seems to *lower* the time 
needed to stat/delete files when XFS is on top of a thin volume, so I do 
not think this is a problem. I'm right?

3) the filesystem is over 2 years old and has a very big number of files 
on it (inode count is 12588595, but each inode has multiple hardlinked 
files). Is this slow delete performance a side effect of "aging" ?

4) I have not changed the default read-ahead value (256 KB). I know this 
is quite small compared to available disk resources but, before messing 
with low-level block device tuning, I would really like to know your 
opinion on my case.

Thank you all.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Slow file stat/deletion
  2016-11-25 10:40 Slow file stat/deletion Gionatan Danti
@ 2016-11-27 22:14 ` Dave Chinner
  2016-11-28  9:51   ` Gionatan Danti
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2016-11-27 22:14 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-xfs

On Fri, Nov 25, 2016 at 11:40:40AM +0100, Gionatan Danti wrote:
> Hi all,
> I am using a XFS filesystem as a target for rsnapshot hardlink-based
> backups.
> 
> Being hardlink-based, our backups generally are quite fast. However,
> I noticed that for some directories having many small files the
> longer part of the backup process is to remove the old
> (out-of-retention) subdirs that must be purged to make room for the
> new backup iteration.

Ah, hard link farms. aka "How to fragment the AGI btrees for fun and
profit."

> Further analysis show that the slower part of the 'rm' process is
> the reading of the affected inodes/dentries. An example: to remove a
> subdir with ~700000 files and directories, the system needs about 30
> minutes. At the same time, issuing a simple "find <dir> / | wc -l"
> (after having dropped the caches) need ~24 minutes. In other words,
> actual reads need 4x the real delete time.
> 
> So, my question is: there is anything I can do to speedup the
> read/stat/deletion?

Not now. Speed is a factor of the inode layout and seek times. Find
relies on sequential directory access which is sped up on XFS by
internal btree readahead and it doesn't require reading the extent
list. rm processes inodes one at a time and requires reading of the
extent list so per-inode there is more IO, a lot more CPU time spent
and more per-op latency, so it's no surprise it's much slower than
find.

> 
> Here is my system config:
> CPU: AMD Opteron(tm) Processor 4334
> RAM: 16 GB
> HDD: 12x 2TB WD RE in a RAID6 array (64k stripe unit), attached to a
> PERC H700 controller with 512MB BBU writeback cache
> OS:  CentOS 7.2 x86_64 with 3.10.0-327.18.2.el7.x86_64 kernel
> 
> Relevant LVM setup:
> LV           VG         Attr       LSize  Pool         Origin Data%
> Meta%  Move Log Cpy%Sync Convert Chunk
> 000-ThinPool vg_storage twi-aotz-- 10,85t                     86,71
> 38,53                            8,00m
> Storage      vg_storage Vwi-aotz-- 10,80t 000-ThinPool        87,12
> 0
> 
> XFS filesystem info:
> meta-data=/dev/mapper/vg_storage-Storage isize=512    agcount=32,
> agsize=90596992 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=0        finobt=0

finobt=0.

finobt was added primarily to solve inode allocation age-related
degradation for hard link farm style workloads. It will have
significant impact on unlink as well, because initial inode
allocation patterns will be better...

> Some consideration:
> 1) I am using a single big thin volume because back in the time ( >2
> years ago) I was not sure about XFS and, having no shrinking
> capability, I relayed on thin volume unmap should the filesystem
> choice change. However, thin pool's chunk size is quite big (8 MB)
> so it should not pose acute fragmentation problem;

Nope, but it means that what should be sequential IO is probably
going to be random. i.e. instead of directory/inode/extent reading
IO having minimum track-track seek latency because they are all
nearby (1-2ms), they'll be average seeks (6-7ms) because locality is no
longer as the filesystem has optimised for.

> 
> 2) due to being layered over a thinly provided volume, the
> filesystem was created with "noalign" option.

noalign affects data placement only, and only for filesystems that
have a stripe unit/width set, which yours doesn't:

>			sunit=0      swidth=0 blks

> I run some in-the-lab
> test on a spare machine and I (still) find that this option seems to
> *lower* the time needed to stat/delete files when XFS is on top of a
> thin volume, so I do not think this is a problem. I'm right?

It's not a problem because it doesn't do anything with your fs
config.

> 3) the filesystem is over 2 years old and has a very big number of
> files on it (inode count is 12588595, but each inode has multiple
> hardlinked files). Is this slow delete performance a side effect of
> "aging" ?

Yes. Made worse by being on a thinp volume.


> 4) I have not changed the default read-ahead value (256 KB). I know
> this is quite small compared to available disk resources but, before
> messing with low-level block device tuning, I would really like to
> know your opinion on my case.

Only used for data readahead. Will make no difference to
directory/stat/unlink performance.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Slow file stat/deletion
  2016-11-27 22:14 ` Dave Chinner
@ 2016-11-28  9:51   ` Gionatan Danti
  2016-11-28 21:53     ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Gionatan Danti @ 2016-11-28  9:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Gionatan Danti



On 27/11/2016 23:14, Dave Chinner wrote:
>
> Ah, hard link farms. aka "How to fragment the AGI btrees for fun and
> profit."
>

Interesting... there is anything I can read about AGI fragmentation?

>
> Not now. Speed is a factor of the inode layout and seek times. Find
> relies on sequential directory access which is sped up on XFS by
> internal btree readahead and it doesn't require reading the extent
> list. rm processes inodes one at a time and requires reading of the
> extent list so per-inode there is more IO, a lot more CPU time spent
> and more per-op latency, so it's no surprise it's much slower than
> find.
>

To tell the truth, "find" and "rm" show quite similar results: ~24 
minutes for the former, ~30 minutes for the latter. I perfectly 
understand that "rm" is going to be slower than find; my point is that 
*even* "find" seems quite slow...

>
> finobt=0.
>
> finobt was added primarily to solve inode allocation age-related
> degradation for hard link farm style workloads. It will have
> significant impact on unlink as well, because initial inode
> allocation patterns will be better...
>

This is a very interesting information; thank you.

>
> Nope, but it means that what should be sequential IO is probably
> going to be random. i.e. instead of directory/inode/extent reading
> IO having minimum track-track seek latency because they are all
> nearby (1-2ms), they'll be average seeks (6-7ms) because locality is no
> longer as the filesystem has optimised for.
>

Should not thinp overhead be minimized by the big (8 MB) chunk size? Are 
inode allocation so much scattered around LBAs? Maybe the slowdown can 
be increased by bad journal placement (I imagine it is near the start of 
the disk, while current read/write activity surely happen near the end)?

>
> noalign affects data placement only, and only for filesystems that
> have a stripe unit/width set, which yours doesn't:
>
>> 			sunit=0      swidth=0 blks

Isn't that the proper results of "noalign"? By opting for "noalign" I am 
telling mkfs to discard any stripe information, right?

>
> Yes. Made worse by being on a thinp volume.
>

I can't do anything for that?

>
> Only used for data readahead. Will make no difference to
> directory/stat/unlink performance.

Thank you again for valuable information.

>
> Cheers,
>
> Dave.
>

Thanks Dave.


-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Slow file stat/deletion
  2016-11-28  9:51   ` Gionatan Danti
@ 2016-11-28 21:53     ` Dave Chinner
  2016-11-29  7:53       ` Gionatan Danti
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2016-11-28 21:53 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-xfs

On Mon, Nov 28, 2016 at 10:51:42AM +0100, Gionatan Danti wrote:
> 
> 
> On 27/11/2016 23:14, Dave Chinner wrote:
> >
> >Ah, hard link farms. aka "How to fragment the AGI btrees for fun and
> >profit."
> >
> 
> Interesting... there is anything I can read about AGI fragmentation?

Read up on finobt and the bug reports on the list about how inode
allocation slows to a crawl....

> >Nope, but it means that what should be sequential IO is probably
> >going to be random. i.e. instead of directory/inode/extent reading
> >IO having minimum track-track seek latency because they are all
> >nearby (1-2ms), they'll be average seeks (6-7ms) because locality is no
> >longer as the filesystem has optimised for.
> >
> 
> Should not thinp overhead be minimized by the big (8 MB) chunk size?

Minimised - maybe. Removed - no.

> Are inode allocation so much scattered around LBAs?

Yes. XFS distributes inodes across the entire device LBA.

> Maybe the
> slowdown can be increased by bad journal placement (I imagine it is
> near the start of the disk, while current read/write activity surely
> happen near the end)?

Contributing factor, yes. You just have to live with that thinp
behaviour.

> >noalign affects data placement only, and only for filesystems that
> >have a stripe unit/width set, which yours doesn't:
> >
> >>			sunit=0      swidth=0 blks
> 
> Isn't that the proper results of "noalign"?

No. "noalign" is a mount option - the sunit/swidth are geometry
values stored in the superblock. noalign will override the
superblock values, but it does not make them go away.

> By opting for "noalign"
> I am telling mkfs to discard any stripe information, right?

No. You are telling it to ignore stripe alignment for file data
allocation purposes.

> >Yes. Made worse by being on a thinp volume.
> 
> I can't do anything for that?

Nope. There's always going to be a penalty for subverting the
filesystem's physical layout optimisations on storage subsystems
that require physical layout optimisation for performance.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Slow file stat/deletion
  2016-11-28 21:53     ` Dave Chinner
@ 2016-11-29  7:53       ` Gionatan Danti
  2017-04-28 20:14         ` Gionatan Danti
  0 siblings, 1 reply; 7+ messages in thread
From: Gionatan Danti @ 2016-11-29  7:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Gionatan Danti



On 28/11/2016 22:53, Dave Chinner wrote:
>
> Nope. There's always going to be a penalty for subverting the
> filesystem's physical layout optimisations on storage subsystems
> that require physical layout optimisation for performance.
>
> Cheers,
>
> Dave.
>

Ok, very clear.

Thank you for taking the time to explain, Dave.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Slow file stat/deletion
  2016-11-29  7:53       ` Gionatan Danti
@ 2017-04-28 20:14         ` Gionatan Danti
  2017-04-28 21:03           ` Eric Sandeen
  0 siblings, 1 reply; 7+ messages in thread
From: Gionatan Danti @ 2017-04-28 20:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Hi all,
this is a note which can be useful to someone searching for the same 
thing: using "ftype=1" at mkfs time (which is, by the way, the default 
for CRC-enabled filesystem) greatly speed-up stat/find operations.

On a test case with 1.000.000 files and 1000 total directories, "find 
/mnt/xfs" time has fallen from ~70 to ~20 seconds.

As a side question: there is no method/possibility to enable ftype=1 to 
an already existing filesystem, right?
Thanks.

Il 29-11-2016 08:53 Gionatan Danti ha scritto:
> On 28/11/2016 22:53, Dave Chinner wrote:
>> 
>> Nope. There's always going to be a penalty for subverting the
>> filesystem's physical layout optimisations on storage subsystems
>> that require physical layout optimisation for performance.
>> 
>> Cheers,
>> 
>> Dave.
>> 
> 
> Ok, very clear.
> 
> Thank you for taking the time to explain, Dave.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Slow file stat/deletion
  2017-04-28 20:14         ` Gionatan Danti
@ 2017-04-28 21:03           ` Eric Sandeen
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2017-04-28 21:03 UTC (permalink / raw)
  To: Gionatan Danti, Dave Chinner; +Cc: linux-xfs

On 4/28/17 3:14 PM, Gionatan Danti wrote:
> Hi all,
> this is a note which can be useful to someone searching for the same thing: using "ftype=1" at mkfs time (which is, by the way, the default for CRC-enabled filesystem) greatly speed-up stat/find operations.
> 
> On a test case with 1.000.000 files and 1000 total directories, "find /mnt/xfs" time has fallen from ~70 to ~20 seconds.
> 
> As a side question: there is no method/possibility to enable ftype=1 to an already existing filesystem, right?
> Thanks.

That is correct.

-Eric


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-04-28 21:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-25 10:40 Slow file stat/deletion Gionatan Danti
2016-11-27 22:14 ` Dave Chinner
2016-11-28  9:51   ` Gionatan Danti
2016-11-28 21:53     ` Dave Chinner
2016-11-29  7:53       ` Gionatan Danti
2017-04-28 20:14         ` Gionatan Danti
2017-04-28 21:03           ` Eric Sandeen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.