Best way to store billions of files

* Best way to store billions of files
@ 2010-08-01 12:08 Roland Rabben
  2010-08-01 16:17 ` Gregory Farnum
  2010-08-24  8:53 ` Anton VG.
  0 siblings, 2 replies; 4+ messages in thread
From: Roland Rabben @ 2010-08-01 12:08 UTC (permalink / raw)
  To: ceph-devel

I am researching alternatives to GlusterFS that I am currently using.
My need is to store billions of files (big and small), and I am trying
to find out if there are any considerations I should make when
planning folder structure and server config using Ceph.

On my GlusterFS system things seems to slow down dramatically as I
grow the number of files. A simple ls takes forever. So I am looking
for alternatives.

Right now my folder structrure looks like this:

Users are grouped into folders, named /000, /001, ... /999 , using a hash.
Each user has its own folder inside the numbered folders
Inside each user-folder the users files are stored in folders named
/000, /001, ... /999, also using a hash.

Would this folder structure or the ammount of files become a problem using Ceph?

I generally use 4U storage nodes with 36 x 1,5 TB or 2 TB SATA drives,
8 core CPU and 6 GB RAM. My application is write once and read many.
What recommendations would you give with regards to setting up the
filesystem on the storage nodes? ext3? ext4? lvm? RAID?

Today I am mounting all disks as individual ext3 partitions and tying
them together with GlusterFS. Would this work with Ceph or would you
recommend making one large LVM volume on each storage node that you
expose to Ceph?

I know Ceph is not production ready yet, but from the activity on this
mailing list things looks promising.

Best regards
Roland Rabben

^ permalink raw reply	[flat|nested] 4+ messages in thread