All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roland Rabben <roland@jotta.no>
To: ceph-devel@vger.kernel.org
Subject: Best way to store billions of files
Date: Sun, 1 Aug 2010 14:08:49 +0200	[thread overview]
Message-ID: <AANLkTi=YSaX9tnynEHj4rS=1wOcTrmvAds7XNnZB1sGj@mail.gmail.com> (raw)

I am researching alternatives to GlusterFS that I am currently using.
My need is to store billions of files (big and small), and I am trying
to find out if there are any considerations I should make when
planning folder structure and server config using Ceph.

On my GlusterFS system things seems to slow down dramatically as I
grow the number of files. A simple ls takes forever. So I am looking
for alternatives.

Right now my folder structrure looks like this:

Users are grouped into folders, named /000, /001, ... /999 , using a hash.
Each user has its own folder inside the numbered folders
Inside each user-folder the users files are stored in folders named
/000, /001, ... /999, also using a hash.

Would this folder structure or the ammount of files become a problem using Ceph?

I generally use 4U storage nodes with 36 x 1,5 TB or 2 TB SATA drives,
8 core CPU and 6 GB RAM. My application is write once and read many.
What recommendations would you give with regards to setting up the
filesystem on the storage nodes? ext3? ext4? lvm? RAID?

Today I am mounting all disks as individual ext3 partitions and tying
them together with GlusterFS. Would this work with Ceph or would you
recommend making one large LVM volume on each storage node that you
expose to Ceph?

I know Ceph is not production ready yet, but from the activity on this
mailing list things looks promising.

Best regards
Roland Rabben

             reply	other threads:[~2010-08-01 12:08 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-01 12:08 Roland Rabben [this message]
2010-08-01 16:17 ` Best way to store billions of files Gregory Farnum
     [not found]   ` <AANLkTimaUsbQTAmOi0TXquOnfJvPMmBkMSRSKuicPvrh@mail.gmail.com>
     [not found]     ` <AANLkTikJSaZx9O2HGzNUJ6YQ5fhtMmE_vfNzQPH=Ferv@mail.gmail.com>
2010-08-01 18:23       ` Roland Rabben
2010-08-24  8:53 ` Anton VG.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTi=YSaX9tnynEHj4rS=1wOcTrmvAds7XNnZB1sGj@mail.gmail.com' \
    --to=roland@jotta.no \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.