All of lore.kernel.org
 help / color / mirror / Atom feed
* LWN.net article: creating 1 billion files -> XFS looses
@ 2010-08-19 11:12 Michael Monnerie
  2010-08-19 12:05 ` Christoph Hellwig
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Michael Monnerie @ 2010-08-19 11:12 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 597 bytes --]

The subject is a bit harsh, but overall the article says:
XFS is slowest on creating and deleting a billion files
XFS fsck needs 30GB RAM to fsck that 100TB filesystem.

http://lwn.net/SubscriberLink/400629/3fb4bc34d6223b32/

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> XFS looses
  2010-08-19 11:12 LWN.net article: creating 1 billion files -> XFS looses Michael Monnerie
@ 2010-08-19 12:05 ` Christoph Hellwig
  2010-08-19 12:45   ` Michael Monnerie
  2010-08-19 13:10 ` Emmanuel Florac
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2010-08-19 12:05 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs

On Thu, Aug 19, 2010 at 01:12:45PM +0200, Michael Monnerie wrote:
> The subject is a bit harsh, but overall the article says:
> XFS is slowest on creating and deleting a billion files
> XFS fsck needs 30GB RAM to fsck that 100TB filesystem.
> 
> http://lwn.net/SubscriberLink/400629/3fb4bc34d6223b32/

The creation and deletion performance is a known issue, and too a large
extent fixes by the new delaylog code.  We're not quite as fast as ext4
yet, but it's getting close.

The repair result looks a lot like the pre-3.1.0 xfsprogs repair.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> XFS looses
  2010-08-19 12:05 ` Christoph Hellwig
@ 2010-08-19 12:45   ` Michael Monnerie
  2010-08-19 13:55     ` Stan Hoeppner
  2010-08-20  7:55     ` Dave Chinner
  0 siblings, 2 replies; 22+ messages in thread
From: Michael Monnerie @ 2010-08-19 12:45 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: Text/Plain, Size: 1064 bytes --]

On Donnerstag, 19. August 2010 Christoph Hellwig wrote:
> The creation and deletion performance is a known issue, and too a
>  large extent fixes by the new delaylog code.  We're not quite as
>  fast as ext4 yet, but it's getting close.
> 
> The repair result looks a lot like the pre-3.1.0 xfsprogs repair.
 
Yes I know. I thought some XFS dev might contact the Author to do some 
re-testing, as a reputation is quickly destroyed by such articles and 
takes long to be returned. Just this week I had a friend in a FS 
discussion saying "ins't XFS destroying/zeroing files on power 
failure?". That information is ancient, but things like that stay in 
peoples brain for(almost)ever.


-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> XFS looses
  2010-08-19 11:12 LWN.net article: creating 1 billion files -> XFS looses Michael Monnerie
  2010-08-19 12:05 ` Christoph Hellwig
@ 2010-08-19 13:10 ` Emmanuel Florac
  2010-09-06 13:42 ` Emmanuel Florac
  2010-09-16 10:13 ` LWN.net article: creating 1 billion files -> Tests we did Emmanuel Florac
  3 siblings, 0 replies; 22+ messages in thread
From: Emmanuel Florac @ 2010-08-19 13:10 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 876 bytes --]

Le Thu, 19 Aug 2010 13:12:45 +0200
Michael Monnerie <michael.monnerie@is.it-management.at> écrivait:

> The subject is a bit harsh, but overall the article says:
> XFS is slowest on creating and deleting a billion files
> XFS fsck needs 30GB RAM to fsck that 100TB filesystem.

Too bad I haven't got a 100 TB machine at hand. However I have a 24TB
system dedicated to tests. I'm pretty sure we can do much better with
XFS and the proper mount options :)

In fact, I have an unused 40 TB array too. See this one later...

Stay tuned :)

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> XFS looses
  2010-08-19 12:45   ` Michael Monnerie
@ 2010-08-19 13:55     ` Stan Hoeppner
  2010-08-20  7:55     ` Dave Chinner
  1 sibling, 0 replies; 22+ messages in thread
From: Stan Hoeppner @ 2010-08-19 13:55 UTC (permalink / raw)
  To: xfs

Michael Monnerie put forth on 8/19/2010 7:45 AM:
> Just this week I had a friend in a FS 
> discussion saying "ins't XFS destroying/zeroing files on power 
> failure?". That information is ancient, but things like that stay in 
> peoples brain for(almost)ever.

Had a similar lengthy discussion over on debian-users not more than a month or
so ago.  Same thing there.  Of the 10 or so people active in the thread, I'd
say 8 of them were anti-XFS because of the "corruption  due to power failure"
issue that they'd "read about" years before.  Not a single one of them had
ever used XFS.  A couple of them considered it a "hobbyist quality" filesystem
that might be ready for production use in a few years.  Ahh the ignorance
abounds in our world...

I did my best to educate them, sending them to the Wikipedia page on XFS and
to the xfs.org site and specifically the relevant sections of the FAQ.
Unfortunately there are some people who simply refuse to be educated.  But
those type of people aren't candidates for XFS anyway, thankfully. ;)

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> XFS looses
  2010-08-19 12:45   ` Michael Monnerie
  2010-08-19 13:55     ` Stan Hoeppner
@ 2010-08-20  7:55     ` Dave Chinner
  1 sibling, 0 replies; 22+ messages in thread
From: Dave Chinner @ 2010-08-20  7:55 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs

On Thu, Aug 19, 2010 at 02:45:22PM +0200, Michael Monnerie wrote:
> On Donnerstag, 19. August 2010 Christoph Hellwig wrote:
> > The creation and deletion performance is a known issue, and too a
> >  large extent fixes by the new delaylog code.  We're not quite as
> >  fast as ext4 yet, but it's getting close.
> > 
> > The repair result looks a lot like the pre-3.1.0 xfsprogs repair.
>  
> Yes I know. I thought some XFS dev might contact the Author to do some 
> re-testing, as a reputation is quickly destroyed by such articles and 
> takes long to be returned. Just this week I had a friend in a FS 
> discussion saying "ins't XFS destroying/zeroing files on power 
> failure?". That information is ancient, but things like that stay in 
> peoples brain for(almost)ever.

Don't worry too much - I have the details of the test that was run
and alredy know why XFS appeared so slow: it was single threaded.
Immediately, that means XFS will be slower to create 1b files
regardless of any other detail.

Look at it this way - the initial numbers I'm seeing on my test rig
are sustained create rates of about 8,000/s with default mkfs/mount
options (i.e. no tuning, no delayed logging, 32k logbsize, etc) and
it is burning exactly one of 8 CPUs in the VM. I know I can get an
order of magnitude better performance out of XFS on this VM....

It'll take me a few days to run the numbers to be able to write a
solid reply, but I have every confidence that a "create 1b inodes"
benchmark tuned to XFS's strengths rather than one designed to avoid
ext4's weaknesses will show very, very different results.

In the meantime, there is no need to start a flamewar. ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> XFS looses
  2010-08-19 11:12 LWN.net article: creating 1 billion files -> XFS looses Michael Monnerie
  2010-08-19 12:05 ` Christoph Hellwig
  2010-08-19 13:10 ` Emmanuel Florac
@ 2010-09-06 13:42 ` Emmanuel Florac
  2010-09-06 22:04   ` Dave Chinner
  2010-09-16 10:13 ` LWN.net article: creating 1 billion files -> Tests we did Emmanuel Florac
  3 siblings, 1 reply; 22+ messages in thread
From: Emmanuel Florac @ 2010-09-06 13:42 UTC (permalink / raw)
  To: xfs

Le Thu, 19 Aug 2010 13:12:45 +0200
Michael Monnerie <michael.monnerie@is.it-management.at> écrivait:

> The subject is a bit harsh, but overall the article says:
> XFS is slowest on creating and deleting a billion files
> XFS fsck needs 30GB RAM to fsck that 100TB filesystem.

Just to go on this subject : a colleague (following my suggestion :)
tried to create 1 billion files in the same XFS directory.
Unfortunately the directories themselves don't scale well that far :
after 1 million files in the first 30 minutes, file creation slows down
gradually, so after 100 hours we had about 230 million files. The
directory size at that point was 5,3 GB.

Now we're starting afresh with 1000 directories with 1 million files
each :)

(Kernel version used : vanilla 2.6.32.11 x86_64 smp)

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> XFS looses
  2010-09-06 13:42 ` Emmanuel Florac
@ 2010-09-06 22:04   ` Dave Chinner
  2010-09-06 22:58     ` Michael Monnerie
  2010-09-07  6:46     ` Emmanuel Florac
  0 siblings, 2 replies; 22+ messages in thread
From: Dave Chinner @ 2010-09-06 22:04 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

On Mon, Sep 06, 2010 at 03:42:54PM +0200, Emmanuel Florac wrote:
> Le Thu, 19 Aug 2010 13:12:45 +0200
> Michael Monnerie <michael.monnerie@is.it-management.at> écrivait:
> 
> > The subject is a bit harsh, but overall the article says:
> > XFS is slowest on creating and deleting a billion files
> > XFS fsck needs 30GB RAM to fsck that 100TB filesystem.
> 
> Just to go on this subject : a colleague (following my suggestion :)
> tried to create 1 billion files in the same XFS directory.
> Unfortunately the directories themselves don't scale well that far :
> after 1 million files in the first 30 minutes, file creation slows down
> gradually, so after 100 hours we had about 230 million files. The
> directory size at that point was 5,3 GB.

Oh, that's larger than I've every run before ;)

Try using:

# mkfs.xfs -d size=64k

Will speed up large directory operations by at least an order of
magnitude.

> Now we're starting afresh with 1000 directories with 1 million files
> each :)

Which is exactly the test that was used to generate the numbers that
were published.

> (Kernel version used : vanilla 2.6.32.11 x86_64 smp)

Not much point in testing that kernel - delayed logging is where the
future is for this sort of workload, which is what I'm testing.

FWIW, I'm able to create 50 million inodes in under 14 minutes with
delayed logging and 8 threads using directories of 100k entries.

The run to 1 billion inodes that I started late last night (10 hours
in) has just passed 700M inodes on a 16TB filesystem.  It's running
at about 25,000 creates/s, but it is limited by bad shrinker
behaviour causing the dentry cache to be completely trashed causing
~3000 read iops to reload dentries that are still necessary for
operation. It should be running about 3-4x faster than that.

FYI, The reason I'm taking a while to get numbers is that parallel
create workloads of this scale are showing significant problems (VM
livelocks, shrinker misbehaviour, lock contention in IO completion
processing, buffer cache hash scaling issues, etc) and I'm trying to
fix them as I go - these metadata workloads are completely
unexplored territory....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> XFS looses
  2010-09-06 22:04   ` Dave Chinner
@ 2010-09-06 22:58     ` Michael Monnerie
  2010-09-07  3:31       ` Dave Chinner
  2010-09-07  6:46     ` Emmanuel Florac
  1 sibling, 1 reply; 22+ messages in thread
From: Michael Monnerie @ 2010-09-06 22:58 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: Text/Plain, Size: 1231 bytes --]

On Dienstag, 7. September 2010 Dave Chinner wrote:
> # mkfs.xfs -d size=64k
> 
> Will speed up large directory operations by at least an order of
> magnitude.
 
I've read the man page for mkfs.xfs, but I couldn't find out if using 
mkfs -d su=64k,sw=2
would be a redundant (and superior) option for that? I'd guess so, 
reading the description of sunit:

sunit=value
This  is  used  to specify the stripe unit for a RAID device or a 
logical volume. The value has to be specified in 512-byte block units. 
Use the su suboption to specify the stripe unit size in bytes. This
suboption ensures that data allocations will be stripe unit aligned when 
the current end of file is being extended and the file size is larger 
than 512KiB. Also inode allocations and the internal log will
be stripe unit aligned.

Or would I still need to use size=64k?

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> XFS looses
  2010-09-06 22:58     ` Michael Monnerie
@ 2010-09-07  3:31       ` Dave Chinner
  2010-09-07  6:20         ` Michael Monnerie
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Chinner @ 2010-09-07  3:31 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs

On Tue, Sep 07, 2010 at 12:58:40AM +0200, Michael Monnerie wrote:
> On Dienstag, 7. September 2010 Dave Chinner wrote:
> > # mkfs.xfs -d size=64k
> > 
> > Will speed up large directory operations by at least an order of
> > magnitude.
>  
> I've read the man page for mkfs.xfs, but I couldn't find out if using 
> mkfs -d su=64k,sw=2

Sorry, I screwed that up, it should have read:

# mkfs.xfs -n size=64k

(-n = naming = directories. -d = data != directories)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> XFS looses
  2010-09-07  3:31       ` Dave Chinner
@ 2010-09-07  6:20         ` Michael Monnerie
  2010-09-07  7:01           ` Dave Chinner
  0 siblings, 1 reply; 22+ messages in thread
From: Michael Monnerie @ 2010-09-07  6:20 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: Text/Plain, Size: 1251 bytes --]

On Dienstag, 7. September 2010 Dave Chinner wrote:
> # mkfs.xfs -n size=64k
> (-n = naming = directories. -d = data != directories)

Thank you, Dave. Do I interpret that parameter right:

When a new directory is created, per default it would occupy only 4KB, 
with -n size=64k would be reserved. As the directory fills, space within 
that block will be used, so in the default case after 4KB (how many 
inodes would that be roughly? 256 Bytes/Inode, so 16 entries?) XFS would 
reserve the next block, but in your case 256 entries would fit.

That would keep dir fragmentation lower, and with todays disks, take a 
minimal more space, so it sounds very good to use that option. 
Especially with RAIDs, where stripes usually are 64KB or bigger. Or 
would the waste of space be so big that it could hurt?

Last question: Is there a way to set that option on a given XFS?

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> XFS looses
  2010-09-06 22:04   ` Dave Chinner
  2010-09-06 22:58     ` Michael Monnerie
@ 2010-09-07  6:46     ` Emmanuel Florac
  1 sibling, 0 replies; 22+ messages in thread
From: Emmanuel Florac @ 2010-09-07  6:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Le Tue, 7 Sep 2010 08:04:10 +1000 vous écriviez:

> Oh, that's larger than I've ever run before ;)

Excellent :) Still works fine afterwards; mount, umount, etc works
flawlessly. Memory consumption though is huge :)
> 
> Try using:
> 
> # mkfs.xfs -d size=64k
> 
> Will speed up large directory operations by at least an order of
> magnitude.

OK, we'll try that too :)
 
> > Now we're starting afresh with 1000 directories with 1 million files
> > each :)
> 
> Which is exactly the test that was used to generate the numbers that
> were published.
> 
> > (Kernel version used : vanilla 2.6.32.11 x86_64 smp)
> 
> Not much point in testing that kernel - delayed logging is where the
> future is for this sort of workload, which is what I'm testing.

I'll compile a 2.6.36rc for comparison.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> XFS looses
  2010-09-07  6:20         ` Michael Monnerie
@ 2010-09-07  7:01           ` Dave Chinner
  2010-09-08  5:42             ` Michael Monnerie
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Chinner @ 2010-09-07  7:01 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs

On Tue, Sep 07, 2010 at 08:20:07AM +0200, Michael Monnerie wrote:
> On Dienstag, 7. September 2010 Dave Chinner wrote:
> > # mkfs.xfs -n size=64k
> > (-n = naming = directories. -d = data != directories)
> 
> Thank you, Dave. Do I interpret that parameter right:
> 
> When a new directory is created, per default it would occupy only 4KB, 
> with -n size=64k would be reserved.

No, it allocates 64k blocks for the directory instead of 4k blocks.

> As the directory fills, space within 
> that block will be used, so in the default case after 4KB (how many 
> inodes would that be roughly? 256 Bytes/Inode, so 16 entries?) XFS would 
> reserve the next block, but in your case 256 entries would fit.

Inodes are not stored in the dirctory structure, only the directory
entry name and the inode number. Hence the amount of space used by a
directory entry is determined by the length of the name.

> That would keep dir fragmentation lower, and with todays disks, take a 
> minimal more space, so it sounds very good to use that option. 
> Especially with RAIDs, where stripes usually are 64KB or bigger. Or 
> would the waste of space be so big that it could hurt?

Well, there is extra overhead to allocate large directory blocks (16
pages instead of one, to begin with, then there's the vmap overhead,
etc), so for small directories smaller block sizes are faster for
create and unlink operations.

For empty directorys, operations on 4k block sized directories
consume roughly 50% less CPU that 64k block size directories. The
4k block size directoeies consume less CPU out to roughly 1.5
million entries where the two are roughly equal. At directory sizes
of 10 million entries, 64k directory block operations are consuming
about 15% of the CPU that 4k directory block operations consume.

In terms of lookups, the 64k block directory will take less IO but
consume more CPU for a given lookup. Hence it depends on your IO
latency and whether directory readahead can hide that latency as to
which will be faster. e.g. For SSDs, CPU usage might be the limiting
factor, not the IO. Right now I don't have any numbers on what
the difference might be - I'm getting 1B inode population issues worked
out first before I start on measuring cold cache lookup times on 1B
files....

> Last question: Is there a way to set that option on a given XFS?

No, it is a mkfs time parameter, though we have been discussing the
possibility of being able to set it per-directory (at mkdir time
when no blocks have been allocated).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> XFS looses
  2010-09-07  7:01           ` Dave Chinner
@ 2010-09-08  5:42             ` Michael Monnerie
  0 siblings, 0 replies; 22+ messages in thread
From: Michael Monnerie @ 2010-09-08  5:42 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: Text/Plain, Size: 612 bytes --]

On Dienstag, 7. September 2010 Dave Chinner wrote:
> # mkfs.xfs -n size=64k
 
That explanation was worth a FAQ entry to keep it as a reference.

http://xfs.org/index.php/XFS_FAQ#Q:_Performance:_mkfs.xfs_-n_size.3D64k_option

Thanks, Dave.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> Tests we did
  2010-08-19 11:12 LWN.net article: creating 1 billion files -> XFS looses Michael Monnerie
                   ` (2 preceding siblings ...)
  2010-09-06 13:42 ` Emmanuel Florac
@ 2010-09-16 10:13 ` Emmanuel Florac
  2010-09-16 21:53   ` Stan Hoeppner
  2010-09-17 19:57   ` Peter Grandi
  3 siblings, 2 replies; 22+ messages in thread
From: Emmanuel Florac @ 2010-09-16 10:13 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs

Le Thu, 19 Aug 2010 13:12:45 +0200
Michael Monnerie <michael.monnerie@is.it-management.at> écrivait:

> The subject is a bit harsh, but overall the article says:
> XFS is slowest on creating and deleting a billion files
> XFS fsck needs 30GB RAM to fsck that 100TB filesystem.
> 
> http://lwn.net/SubscriberLink/400629/3fb4bc34d6223b32/

So We've made a test with 1KB files (space, space...) and a production
kernel : 2.6.32.11 (yeah I know, 2.6.38 should be faster but you know,
we upgrade our production kernels prudently :). 

mk1BFiles will create and delete 1000000000 files with 32 threads
Version: v0.2.4-10-gf6decd3, build: Sep  7 2010 13:39:34

Creating 1000000000 files, started at 2010-09-07 13:45:16...
Done, time spent: 89:35:12.262

Doing `ls -R`, started at 2010-09-11 07:20:28...
Stat: ls (pid: 18844) status: ok, returned value: 0
Cpu usage: user: 1:27:47.242, system: 20:18:21.689
Max rss: 229.01 MBytes, page fault: major: 4, minor: 58694

Compute size used by 1000000000 files, started at 2010-09-12 09:30:52...
Size used by files: 11.1759 TBytes
Size used by directory: 32.897 GBytes
Size used (total): 11.2080 TBytes
Done, time spent: 25:50:32.355

Deleting 1000000000 files, started at 2010-09-13 11:21:24...
Done, time spent: 68:37:38.117

Test run on a dual Opteron quad core, 16 GB RAM, kernel 2.6.32.11
x86_64...

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> Tests we did
  2010-09-16 10:13 ` LWN.net article: creating 1 billion files -> Tests we did Emmanuel Florac
@ 2010-09-16 21:53   ` Stan Hoeppner
  2010-09-17  7:54     ` Michael Monnerie
                       ` (2 more replies)
  2010-09-17 19:57   ` Peter Grandi
  1 sibling, 3 replies; 22+ messages in thread
From: Stan Hoeppner @ 2010-09-16 21:53 UTC (permalink / raw)
  To: xfs

Emmanuel Florac put forth on 9/16/2010 5:13 AM:

> Test run on a dual Opteron quad core, 16 GB RAM, kernel 2.6.32.11
> x86_64...

This is a test of storage system performance, and you left out the
storage array specs?  By doing so it seems you're stating the underlying
storage is not relevant to the results.

So, are you saying I should be able to duplicate your results with that
dual Opty system, but using an md RAID0 stripe over 8x2TB SATA disks
connected to two $60 4 port SiI 3124 PCIe x1 cards?

-- 
Stan



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> Tests we did
  2010-09-16 21:53   ` Stan Hoeppner
@ 2010-09-17  7:54     ` Michael Monnerie
  2010-09-17 19:29     ` Peter Grandi
  2010-09-18 11:16     ` Emmanuel Florac
  2 siblings, 0 replies; 22+ messages in thread
From: Michael Monnerie @ 2010-09-17  7:54 UTC (permalink / raw)
  To: xfs; +Cc: Stan Hoeppner


[-- Attachment #1.1: Type: Text/Plain, Size: 1328 bytes --]

On Donnerstag, 16. September 2010 Stan Hoeppner wrote:
> So, are you saying I should be able to duplicate your results with
>  that dual Opty system, but using an md RAID0 stripe over 8x2TB SATA
>  disks connected to two $60 4 port SiI 3124 PCIe x1 cards?
 
According to Dave, with his patches you should even outperform that if 
you got faster CPUs :-)

Emmanuel, where is the "mk1BFiles" Benchmark? We're planning for a new 
hardware this year, so this would be a good time to run it.  Could I 
have the script?

The output misses the time "ls" took, but I can one can calculate that 
from the next test start, so 9:30-7:20 = 2:10, is that correct? Two 
hours to just see all files, ugh. I guess it will take some years until 
we want to have such a filesystem. Either hardware must become quicker, 
or another wonderful new patch is needed.

-- 
// Michael Monnerie, Ing.BSc.
----------------------------------
Sorcerers have their magic wands:
  powerful, potentially dangerous tools with a life of their own.
Witches have their familiars:
  creatures disguised as household beasts that could,
  if they choose, wreak the witches' havoc.
Mystics have their golems:
  beings built of wood and tin brought to life to do their
  masters' bidding.
I have Linux.
----------------------------------

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> Tests we did
  2010-09-16 21:53   ` Stan Hoeppner
  2010-09-17  7:54     ` Michael Monnerie
@ 2010-09-17 19:29     ` Peter Grandi
  2010-09-18 11:25       ` Emmanuel Florac
  2010-09-18 11:16     ` Emmanuel Florac
  2 siblings, 1 reply; 22+ messages in thread
From: Peter Grandi @ 2010-09-17 19:29 UTC (permalink / raw)
  To: Linux XFS


[ ... useless run of something misrepresented as a test ... ]

>> Test run on a dual Opteron quad core, 16 GB RAM, kernel
>> 2.6.32.11 x86_64...

> This is a test of storage system performance, and you left out
> the storage array specs?  By doing so it seems you're stating
> the underlying storage is not relevant to the results. [ ... ]

This is only one of the several aspects of the waste of time
that was misrepresented as a storage test, even if it is one of
the funniest.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> Tests we did
  2010-09-16 10:13 ` LWN.net article: creating 1 billion files -> Tests we did Emmanuel Florac
  2010-09-16 21:53   ` Stan Hoeppner
@ 2010-09-17 19:57   ` Peter Grandi
  2010-09-18 11:39     ` Emmanuel Florac
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Grandi @ 2010-09-17 19:57 UTC (permalink / raw)
  To: Linux XFS


>> The subject is a bit harsh, but overall the article says: XFS
>> is slowest on creating and deleting a billion files XFS fsck
>> needs 30GB RAM to fsck that 100TB filesystem.

Hahahaha. Very funny. So what?

>> http://lwn.net/SubscriberLink/400629/3fb4bc34d6223b32/

LWN is usually fairly decent, but I have noticed it does
occasionally waste pixels on/bits things that the author(s)
misrepresent as storage or file system tests.

However in this case the main takeaway of the presentation
reported is that it is just a bad idea to assume that file systems
can scale to large collections of small files as DBMSes designed
for that purpose. So what?

> So We've made a test with 1KB files (space, space...) and a
> production kernel : 2.6.32.11 (yeah I know, 2.6.38 should be
> faster but you know, we upgrade our production kernels prudently :).

Why is this a test of anything other than how to waste time?

> mk1BFiles will create and delete 1000000000 files with 32
> threads Version: v0.2.4-10-gf6decd3, build: Sep 7 2010
> 13:39:34

> Creating 1000000000 files, started at 2010-09-07 13:45:16...
> Done, time spent: 89:35:12.262

Was there any intervening cache flush?

> Doing `ls -R`, started at 2010-09-11 07:20:28...
> Stat: ls (pid: 18844) status: ok, returned value: 0
> Cpu usage: user: 1:27:47.242, system: 20:18:21.689
> Max rss: 229.01 MBytes, page fault: major: 4, minor: 58694

Was there any intervening cache flush?

> Compute size used by 1000000000 files, started at 2010-09-12 09:30:52...
> Size used by files: 11.1759 TBytes
> Size used by directory: 32.897 GBytes
> Size used (total): 11.2080 TBytes
> Done, time spent: 25:50:32.355

Was there any intervening cache flush?

> Deleting 1000000000 files, started at 2010-09-13 11:21:24...
> Done, time spent: 68:37:38.117

Was there any intervening cache flush?

Why would anybody with even a little knowledge of computers and
systems want to use a filesystem as database for small records?

> Test run on a dual Opteron quad core, 16 GB RAM, kernel 2.6.32.11
> x86_64...

So what?

Some of the most amusing quotes from the LWN article are from the
comments.

 "Recently I did similiar tests for determining how well PostgreSQL
  would be able to deal with databases with potentially hundreds of
  thousands of tables. From what I found out, it's only limited by
  the file system's ability to work with that many files in a
  single directory."

HHAHAHAHAHAHAHA.

 "> But in what situations will it make more sense to not group a
  > billion of file items into logical groups?
 
  Things like squid cache directories, git object directories,
  ccache cache directories, that hidden thumbnails directory in
  your $HOME... They all have in common that the files are named
  by a hash or something similar. There is no logical grouping at
  all here; it is a completely flat namespace."

AAAAAAGGGGGHHHHHHHHHHH.

But the original presentation has absolutely the funniest bit:

 "Why Not Use a Database?
  ● Users and system administrators are familiar
    with file systems
      Backup, creation, etc are all well understood
  ● File systems handle partial failures pretty well
      Being able to recover part of the stored data is
      useful for some applications
  ● File systems are “cheap” since they come with
    your operating system!"

My evil translation of that is "because so many sysadms and
programmers are incompetent and stupid and wish for ponies".

Of course the best bit is where someone :-) was quoted making
sense:

  “Millions of files may work; but 1 billion is an utter
  absurdity. A filesystem that can store reasonably 1 billion
  small files in 7TB is an unsolved research issue...,”

The stupidest bit of the presentation was part of the quoted
reply:

  “Strangely enough, I have been testing ext4 and stopped filling
  it at a bit over 1 billion 20KB files on Monday (with 60TB of
  storage). Running fsck on it took only 2.4 hours.”

Where the idea that the 'fsck' time that matters is that of a
freshly created (and was the page cache flushed?), uncorrupted
filesystem is intensely comical. "Possible" does not mean
"reasonably". Just delirious.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> Tests we did
  2010-09-16 21:53   ` Stan Hoeppner
  2010-09-17  7:54     ` Michael Monnerie
  2010-09-17 19:29     ` Peter Grandi
@ 2010-09-18 11:16     ` Emmanuel Florac
  2 siblings, 0 replies; 22+ messages in thread
From: Emmanuel Florac @ 2010-09-18 11:16 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

Le Thu, 16 Sep 2010 16:53:07 -0500 vous écriviez:

> This is a test of storage system performance, and you left out the
> storage array specs?  By doing so it seems you're stating the
> underlying storage is not relevant to the results.

Sorry :) The storage is a 24 2TB disks RAID-6 array on a 3W&re 9650. Not
exactly stellar at IOPS performance.

> So, are you saying I should be able to duplicate your results with
> that dual Opty system, but using an md RAID0 stripe over 8x2TB SATA
> disks connected to two $60 4 port SiI 3124 PCIe x1 cards?
> 

Your setup may be slightly slower.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> Tests we did
  2010-09-17 19:29     ` Peter Grandi
@ 2010-09-18 11:25       ` Emmanuel Florac
  0 siblings, 0 replies; 22+ messages in thread
From: Emmanuel Florac @ 2010-09-18 11:25 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux XFS

Le Fri, 17 Sep 2010 20:29:01 +0100 vous écriviez:

> [ ... useless run of something misrepresented as a test ... ]

I won't comment on the usefulness of your rant, though, because I'd
rather stay amiable.

If you don't mind, I'll however post complementary results of the same
test running on different filesystems on the very same hardware,
because apparently you missed the fact that this was about tracking
comparative xfs progress in some metadata intensive workloads. 

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LWN.net article: creating 1 billion files -> Tests we did
  2010-09-17 19:57   ` Peter Grandi
@ 2010-09-18 11:39     ` Emmanuel Florac
  0 siblings, 0 replies; 22+ messages in thread
From: Emmanuel Florac @ 2010-09-18 11:39 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux XFS

Le Fri, 17 Sep 2010 20:57:48 +0100 vous écriviez:

> LWN is usually fairly decent, but I have noticed it does
> occasionally waste pixels on/bits things that the author(s)
> misrepresent as storage or file system tests.

How unfortunate we missed your precious stance on that matter. Everybody
knows that benchmarks are mostly useless /per se/, however it often
occurs that comparative benchmarks may easily reveal some interesting
differences.

As for the interest of an experiment of pushing something to the limit
for the sake of it, it may equally reveal interesting bugs. The fact
that all filesystems in this tests didn't simply *fail* under the load
is by itself revealing of the overall robustness and stability of
these, the VFS and Linux kernel. 

As a side note, were you using a slightly less harsh tone, you'd
probably be help people being less reluctant to discuss those points
more deeply.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-09-18 11:38 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-19 11:12 LWN.net article: creating 1 billion files -> XFS looses Michael Monnerie
2010-08-19 12:05 ` Christoph Hellwig
2010-08-19 12:45   ` Michael Monnerie
2010-08-19 13:55     ` Stan Hoeppner
2010-08-20  7:55     ` Dave Chinner
2010-08-19 13:10 ` Emmanuel Florac
2010-09-06 13:42 ` Emmanuel Florac
2010-09-06 22:04   ` Dave Chinner
2010-09-06 22:58     ` Michael Monnerie
2010-09-07  3:31       ` Dave Chinner
2010-09-07  6:20         ` Michael Monnerie
2010-09-07  7:01           ` Dave Chinner
2010-09-08  5:42             ` Michael Monnerie
2010-09-07  6:46     ` Emmanuel Florac
2010-09-16 10:13 ` LWN.net article: creating 1 billion files -> Tests we did Emmanuel Florac
2010-09-16 21:53   ` Stan Hoeppner
2010-09-17  7:54     ` Michael Monnerie
2010-09-17 19:29     ` Peter Grandi
2010-09-18 11:25       ` Emmanuel Florac
2010-09-18 11:16     ` Emmanuel Florac
2010-09-17 19:57   ` Peter Grandi
2010-09-18 11:39     ` Emmanuel Florac

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.