* Filesystems for ceph
@ 2011-07-27 17:52 Christian Brunner
2011-07-27 22:21 ` Fyodor Ustinov
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Christian Brunner @ 2011-07-27 17:52 UTC (permalink / raw)
To: ceph-devel
We are having quite some problems with the underlying filesystem for
the cosd's and I would like to hear about other experiences. Here is
what we have gone through so far:
btrfs with 2.6.38:
- good performance
- frequently hitting of various BUG_ON conditions
btrfs with 2.6.39:
- big performance problems after a few days uptime
- occasionally hitting BUG_ON conditions
btrfs with 3.0:
- big performance problems after a few days uptime
- occasionally hitting a deadlock in the btrfs filesystem (cosd is in D-state)
ext4 with a RHEL6.0 kernel (don't remember exactly):
- almost immediate blowup of the kernel (OOPS)
From what I read in Fyodors emails ext4 in 2.6.39 isn't much better.
So, what filesystem would you recommend?
Thanks,
Christian
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Filesystems for ceph
2011-07-27 17:52 Filesystems for ceph Christian Brunner
@ 2011-07-27 22:21 ` Fyodor Ustinov
2011-07-27 23:33 ` Sage Weil
2011-07-28 0:04 ` Gregory Farnum
2011-07-28 4:39 ` Sage Weil
2 siblings, 1 reply; 10+ messages in thread
From: Fyodor Ustinov @ 2011-07-27 22:21 UTC (permalink / raw)
To: ceph-devel
Christian Brunner <chb <at> muc.de> writes:
>
> From what I read in Fyodors emails ext4 in 2.6.39 isn't much better.
>
> So, what filesystem would you recommend?
>
Christian, IMHO ext3/ext4/btrfs unusable for ceph. I tested some time xfs, and
think to use it again. But I think that the ceph developers do not like xfs:)
WBR,
Fyodor.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Filesystems for ceph
2011-07-27 22:21 ` Fyodor Ustinov
@ 2011-07-27 23:33 ` Sage Weil
2011-07-28 4:23 ` Fyodor Ustinov
0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2011-07-27 23:33 UTC (permalink / raw)
To: Fyodor Ustinov; +Cc: ceph-devel
On Wed, 27 Jul 2011, Fyodor Ustinov wrote:
> Christian Brunner <chb <at> muc.de> writes:
>
> >
> > From what I read in Fyodors emails ext4 in 2.6.39 isn't much better.
> >
> > So, what filesystem would you recommend?
> >
>
> Christian, IMHO ext3/ext4/btrfs unusable for ceph. I tested some time xfs, and
> think to use it again. But I think that the ceph developers do not like xfs:)
What problems have you seen with ext4?
sage
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Filesystems for ceph
2011-07-27 17:52 Filesystems for ceph Christian Brunner
2011-07-27 22:21 ` Fyodor Ustinov
@ 2011-07-28 0:04 ` Gregory Farnum
2011-07-28 4:39 ` Sage Weil
2 siblings, 0 replies; 10+ messages in thread
From: Gregory Farnum @ 2011-07-28 0:04 UTC (permalink / raw)
To: chb; +Cc: ceph-devel
Our largest installation hasn't seen any problems with btrfs on
2.6.38.6, although it may not be as busy as the clusters you're
running on.
I would recommend trying to use a newish stable release though, since
the major releases I think are still introducing new features into
btrfs, and code churn always creates bugs.
-Greg
On Wed, Jul 27, 2011 at 10:52 AM, Christian Brunner <chb@muc.de> wrote:
> We are having quite some problems with the underlying filesystem for
> the cosd's and I would like to hear about other experiences. Here is
> what we have gone through so far:
>
> btrfs with 2.6.38:
>
> - good performance
> - frequently hitting of various BUG_ON conditions
>
> btrfs with 2.6.39:
>
> - big performance problems after a few days uptime
> - occasionally hitting BUG_ON conditions
>
> btrfs with 3.0:
>
> - big performance problems after a few days uptime
> - occasionally hitting a deadlock in the btrfs filesystem (cosd is in D-state)
>
> ext4 with a RHEL6.0 kernel (don't remember exactly):
>
> - almost immediate blowup of the kernel (OOPS)
>
> From what I read in Fyodors emails ext4 in 2.6.39 isn't much better.
>
> So, what filesystem would you recommend?
>
> Thanks,
> Christian
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Filesystems for ceph
2011-07-27 23:33 ` Sage Weil
@ 2011-07-28 4:23 ` Fyodor Ustinov
2011-07-28 4:35 ` Sage Weil
0 siblings, 1 reply; 10+ messages in thread
From: Fyodor Ustinov @ 2011-07-28 4:23 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On 07/28/2011 02:33 AM, Sage Weil wrote:
> On Wed, 27 Jul 2011, Fyodor Ustinov wrote:
>> Christian Brunner<chb<at> muc.de> writes:
>>
>>> From what I read in Fyodors emails ext4 in 2.6.39 isn't much better.
>>>
>>> So, what filesystem would you recommend?
>>>
>> Christian, IMHO ext3/ext4/btrfs unusable for ceph. I tested some time xfs, and
>> think to use it again. But I think that the ceph developers do not like xfs:)
> What problems have you seen with ext4?
Look to my message to Greg. In a nutshell - I again have broken ext4 fs
on OSD servers with 2.6.39 kernel.
WBR,
Fyodor.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Filesystems for ceph
2011-07-28 4:23 ` Fyodor Ustinov
@ 2011-07-28 4:35 ` Sage Weil
2011-07-28 18:27 ` Fyodor Ustinov
0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2011-07-28 4:35 UTC (permalink / raw)
To: Fyodor Ustinov; +Cc: ceph-devel
On Thu, 28 Jul 2011, Fyodor Ustinov wrote:
> On 07/28/2011 02:33 AM, Sage Weil wrote:
> > On Wed, 27 Jul 2011, Fyodor Ustinov wrote:
> > > Christian Brunner<chb<at> muc.de> writes:
> > >
> > > > From what I read in Fyodors emails ext4 in 2.6.39 isn't much better.
> > > >
> > > > So, what filesystem would you recommend?
> > > >
> > > Christian, IMHO ext3/ext4/btrfs unusable for ceph. I tested some time xfs,
> > > and
> > > think to use it again. But I think that the ceph developers do not like
> > > xfs:)
> > What problems have you seen with ext4?
>
> Look to my message to Greg. In a nutshell - I again have broken ext4 fs
> on OSD servers with 2.6.39 kernel.
Oh, you mean the fsck errors? Got it. If you see it again (on another
disk/machine, with a reasonably fresh fs) the ext4 guys will be interested
in hearing about it. It's probably in the xattr code, which ceph uses
heavily but most people only set once per file (for selinux labels), if
at all.
sage
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Filesystems for ceph
2011-07-27 17:52 Filesystems for ceph Christian Brunner
2011-07-27 22:21 ` Fyodor Ustinov
2011-07-28 0:04 ` Gregory Farnum
@ 2011-07-28 4:39 ` Sage Weil
2011-07-28 7:35 ` srimugunthan dhandapani
2 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2011-07-28 4:39 UTC (permalink / raw)
To: Christian Brunner; +Cc: ceph-devel
On Wed, 27 Jul 2011, Christian Brunner wrote:
> We are having quite some problems with the underlying filesystem for
> the cosd's and I would like to hear about other experiences. Here is
> what we have gone through so far:
>
> btrfs with 2.6.38:
>
> - good performance
> - frequently hitting of various BUG_ON conditions
>
> btrfs with 2.6.39:
>
> - big performance problems after a few days uptime
> - occasionally hitting BUG_ON conditions
>
> btrfs with 3.0:
>
> - big performance problems after a few days uptime
> - occasionally hitting a deadlock in the btrfs filesystem (cosd is in D-state)
You might try 3.0 + the latest stuff Chris just sent to Linus (at least
for the D-state problem). I'm eager to see whether the latencytop info
you sent is helpful for the rest.
FWIW we're running 2.6.38+ at the moment. We get some warnings with the
orphan code (fixed in .39), but it's otherwise been reasonably stable.
sage
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Filesystems for ceph
2011-07-28 4:39 ` Sage Weil
@ 2011-07-28 7:35 ` srimugunthan dhandapani
2011-07-28 16:00 ` Gregory Farnum
0 siblings, 1 reply; 10+ messages in thread
From: srimugunthan dhandapani @ 2011-07-28 7:35 UTC (permalink / raw)
To: Sage Weil; +Cc: Christian Brunner, ceph-devel
On Thu, Jul 28, 2011 at 10:09 AM, Sage Weil <sage@newdream.net> wrote:
> On Wed, 27 Jul 2011, Christian Brunner wrote:
>> We are having quite some problems with the underlying filesystem for
>> the cosd's and I would like to hear about other experiences. Here is
>> what we have gone through so far:
>>
>> btrfs with 2.6.38:
>>
>> - good performance
>> - frequently hitting of various BUG_ON conditions
>>
>> btrfs with 2.6.39:
>>
>> - big performance problems after a few days uptime
>> - occasionally hitting BUG_ON conditions
>>
>> btrfs with 3.0:
>>
>> - big performance problems after a few days uptime
>> - occasionally hitting a deadlock in the btrfs filesystem (cosd is in D-state)
>
> You might try 3.0 + the latest stuff Chris just sent to Linus (at least
> for the D-state problem). I'm eager to see whether the latencytop info
> you sent is helpful for the rest.
>
> FWIW we're running 2.6.38+ at the moment. We get some warnings with the
> orphan code (fixed in .39), but it's otherwise been reasonably stable.
I thought that ext3/4 and btrfs are the only recommended underlying
filesystem for ceph. I didnt know we can use xfs.
With Glusterfs, theoretically, we can use any filesystem with extended
attributes support as backend filesystem. Is that the same case with
ceph too?
Though it may not be optimal, theoretically can i use a "filesystem X"
with extended attributes support for ceph?
thanks,
Mugunthan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Filesystems for ceph
2011-07-28 7:35 ` srimugunthan dhandapani
@ 2011-07-28 16:00 ` Gregory Farnum
0 siblings, 0 replies; 10+ messages in thread
From: Gregory Farnum @ 2011-07-28 16:00 UTC (permalink / raw)
To: srimugunthan dhandapani; +Cc: ceph-devel
On Thu, Jul 28, 2011 at 12:35 AM, srimugunthan dhandapani
<srimugunthan.dhandapani@gmail.com> wrote:
> I thought that ext3/4 and btrfs are the only recommended underlying
> filesystem for ceph. I didnt know we can use xfs.
> With Glusterfs, theoretically, we can use any filesystem with extended
> attributes support as backend filesystem. Is that the same case with
> ceph too?
> Though it may not be optimal, theoretically can i use a "filesystem X"
> with extended attributes support for ceph?
> thanks,
> Mugunthan
These days you can use any filesystem with xattr support as the
backing store for Ceph, yes. btrfs is our primary target for a few
reasons (unlimited xattrs, built-in snapshots make snapshots and
consistency easier, etc) but you can stick whatever you like back
there and Ceph will handle it with writeahead journaling, manual
copying for snapshots, etc etc. :)
-Greg
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Filesystems for ceph
2011-07-28 4:35 ` Sage Weil
@ 2011-07-28 18:27 ` Fyodor Ustinov
0 siblings, 0 replies; 10+ messages in thread
From: Fyodor Ustinov @ 2011-07-28 18:27 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On 07/28/2011 07:35 AM, Sage Weil wrote:
> Oh, you mean the fsck errors? Got it. If you see it again (on another
> disk/machine, with a reasonably fresh fs) the ext4 guys will be interested
> in hearing about it. It's probably in the xattr code, which ceph uses
> heavily but most people only set once per file (for selinux labels), if
> at all.
I now see it on _all_ of 12 osd (8 physical servers, each osd have own fs).
I plan to upgrade the kernel on the osd servers to 3.0 and if do not
get better - then time to shake the guys from ext4 team (with your help)
WBR,
Fyodor.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-07-28 18:27 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-27 17:52 Filesystems for ceph Christian Brunner
2011-07-27 22:21 ` Fyodor Ustinov
2011-07-27 23:33 ` Sage Weil
2011-07-28 4:23 ` Fyodor Ustinov
2011-07-28 4:35 ` Sage Weil
2011-07-28 18:27 ` Fyodor Ustinov
2011-07-28 0:04 ` Gregory Farnum
2011-07-28 4:39 ` Sage Weil
2011-07-28 7:35 ` srimugunthan dhandapani
2011-07-28 16:00 ` Gregory Farnum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.