All of lore.kernel.org
 help / color / mirror / Atom feed
* production ready?
@ 2010-10-13 17:14 Dwight Schauer
       [not found] ` <AANLkTikL-wSkv4uTUM_g5Cs9=k3Q8TNkWXa0KNtnutXJ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Dwight Schauer @ 2010-10-13 17:14 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hello NILFS2 users and developers,

How stable do you consider NILFS2? Performance issues aside, I'm more
concerned about reliability.

I'm considering NILFS2 for a new server setup, as BTRFS does not
appear to production ready yet from what I've read.

So, what I'm asking is this, is NILFS2 considered production ready for servers?

Sincerely,
Dwight Schauer
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
       [not found] ` <AANLkTikL-wSkv4uTUM_g5Cs9=k3Q8TNkWXa0KNtnutXJ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-10-14  5:50   ` Ryusuke Konishi
       [not found]     ` <20101014.145018.209367253.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Ryusuke Konishi @ 2010-10-14  5:50 UTC (permalink / raw)
  To: dschauer-Re5JQEeQqe8AvxtiuMwx3w; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,
On Wed, 13 Oct 2010 12:14:46 -0500, Dwight Schauer wrote:
> Hello NILFS2 users and developers,
> 
> How stable do you consider NILFS2? Performance issues aside, I'm more
> concerned about reliability.
> 
> I'm considering NILFS2 for a new server setup, as BTRFS does not
> appear to production ready yet from what I've read.
> 
> So, what I'm asking is this, is NILFS2 considered production ready for servers?
> 
> Sincerely,
> Dwight Schauer

NILFS2 is almost stable.  We have a record of nine months operation on
in-house samba servers and a webDAV server.

However, a few crash problems are reported on this mailing list and
remain unresolved.  I suspect they are caused by some sort of bug
related to "garbage collection" mechanism, which often makes trouble
on this sort of filesystems.

I think you may as well wait for resolution of these issues and test
enough if you will actually apply it for business use.  NILFS2 is
still marked "experimental" in kernel as with BTRFS.

Thanks,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
       [not found]     ` <20101014.145018.209367253.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2010-10-14 12:25       ` Jérôme Poulin
       [not found]         ` <AANLkTi=hk=xSFLddi5+YOpZNPPDR7rO9Y+zF8N3+Wcdy-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Jérôme Poulin @ 2010-10-14 12:25 UTC (permalink / raw)
  To: dschauer; +Cc: linux-nilfs, Ryusuke Konishi

On Thu, Oct 14, 2010 at 1:50 AM, Ryusuke Konishi
<konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> wrote:
> Hi,
...
> NILFS2 is almost stable.  We have a record of nine months operation on
> in-house samba servers and a webDAV server.
>
> On Wed, 13 Oct 2010 12:14:46 -0500, Dwight Schauer wrote:
>> Hello NILFS2 users and developers,
>>
>> How stable do you consider NILFS2? Performance issues aside, I'm more
>> concerned about reliability.
>>

On my side, it is on my laptop /home partition for a bit more than a
year, first month had corruption because of a bug which was fixed,
then a 2 months later corruption caused by garbage collector not
acting correctly when date is adjusted backward (in fact, forward then
backward in my case). But now it is corruption free and I can see it
when making a backup, no more I/O errors! Had some partition full
problems because garbage collector wouldn't start doing it's job until
I restarted it, but else it's OK.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
       [not found]         ` <AANLkTi=hk=xSFLddi5+YOpZNPPDR7rO9Y+zF8N3+Wcdy-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-10-14 12:55           ` Dwight Schauer
       [not found]             ` <AANLkTik7tjKVBrJ_P83sitOVzeGk70AQCJF_CrwG6hYU-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Dwight Schauer @ 2010-10-14 12:55 UTC (permalink / raw)
  To: linux-nilfs

On Thu, Oct 14, 2010 at 7:25 AM, Jérôme Poulin <jeromepoulin@gmail.com> wrote:
> On my side, it is on my laptop /home partition for a bit more than a year...

On Thu, Oct 14, 2010 at 1:50 AM, Ryusuke Konishi
<konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> wrote:
> NILFS2 is almost stable.  We have a record of nine months operation on
> in-house samba servers and a webDAV server.

OK, Thanks.

There is still still warning when mounting verbosely though:
  mount.nilfs2: WARNING! - The NILFS on-disk format may change at any time.
  mount.nilfs2: WARNING! - Do not place critical data on a NILFS filesystem.

As per the first message, Is the on disk format expected to change any
time soon? In other words, what is the likelihood of my using NILFS
when 2.6.35.x, then having some future kernel upgrade render my NILFS
formatted filesystems un-mountable because the on-disk format changed?

I understand the disclaimer of the the second message.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
       [not found]             ` <AANLkTik7tjKVBrJ_P83sitOVzeGk70AQCJF_CrwG6hYU-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-10-18  2:57               ` Ryusuke Konishi
  0 siblings, 0 replies; 22+ messages in thread
From: Ryusuke Konishi @ 2010-10-18  2:57 UTC (permalink / raw)
  To: dschauer-Re5JQEeQqe8AvxtiuMwx3w; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 14 Oct 2010 07:55:44 -0500, Dwight Schauer wrote:
> On Thu, Oct 14, 2010 at 7:25 AM, Jérôme Poulin <jeromepoulin@gmail.com> wrote:
> > On my side, it is on my laptop /home partition for a bit more than a year...
> 
> On Thu, Oct 14, 2010 at 1:50 AM, Ryusuke Konishi
> <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> wrote:
> > NILFS2 is almost stable.  We have a record of nine months operation on
> > in-house samba servers and a webDAV server.
> 
> OK, Thanks.
> 
> There is still still warning when mounting verbosely though:
>   mount.nilfs2: WARNING! - The NILFS on-disk format may change at any time.
>   mount.nilfs2: WARNING! - Do not place critical data on a NILFS filesystem.
> 
> As per the first message, Is the on disk format expected to change any
> time soon?

Well, I don't want to change disk format in a way that breaks
compatibility.  I'm considering to remove the above message at the
next utility release.

We still have potential to break compatibility to implement essential
features like extended attribute/posix ACLs.  However, I think
influential change should be carefully avoided or limited to the
minimum at this stage.

> In other words, what is the likelihood of my using NILFS
> when 2.6.35.x, then having some future kernel upgrade render my NILFS
> formatted filesystems un-mountable because the on-disk format changed?

I hope this never happens.  We have already started to use nilfs2 for
in-house systems.  My mention above is about forward-compatibility
(i.e. ability that older implementations can read the partition
generated by newer version), I won't break backward-compatibility
except for some extraordinary reason.

Thanks,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-30 21:17                       ` Gandalf Corvotempesta
@ 2012-10-30 21:21                         ` Sage Weil
  0 siblings, 0 replies; 22+ messages in thread
From: Sage Weil @ 2012-10-30 21:21 UTC (permalink / raw)
  To: Gandalf Corvotempesta
  Cc: Dan Mick, 袁冬, Gregory Farnum, ceph-devel

On Tue, 30 Oct 2012, Gandalf Corvotempesta wrote:
> 2012/10/30 Dan Mick <dan.mick@inktank.com>:
> > Generally that's considered OK.  ceph-mon doesn't use very much disk or CPU
> > or network bandwidth.
> 
> In this case, should I reserve some space to ceph-mon (a partition or
> a dedicated disk) or ceph-mon is able to 'share' the osd disk space
> automatically (for example using a directory)?

A common pattern is to give it a directory on the OS/boot volume.  This 
can be a dedicated disk (lots of space for logs) or something carved out 
of another disk (more space for ceph data, but can interfere with ceph-osd 
performance).

sage

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-30 21:06                     ` Dan Mick
@ 2012-10-30 21:17                       ` Gandalf Corvotempesta
  2012-10-30 21:21                         ` Sage Weil
  0 siblings, 1 reply; 22+ messages in thread
From: Gandalf Corvotempesta @ 2012-10-30 21:17 UTC (permalink / raw)
  To: Dan Mick; +Cc: 袁冬, Gregory Farnum, ceph-devel

2012/10/30 Dan Mick <dan.mick@inktank.com>:
> Generally that's considered OK.  ceph-mon doesn't use very much disk or CPU
> or network bandwidth.

In this case, should I reserve some space to ceph-mon (a partition or
a dedicated disk) or ceph-mon is able to 'share' the osd disk space
automatically (for example using a directory)?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-30 14:59                   ` Gandalf Corvotempesta
@ 2012-10-30 21:06                     ` Dan Mick
  2012-10-30 21:17                       ` Gandalf Corvotempesta
  0 siblings, 1 reply; 22+ messages in thread
From: Dan Mick @ 2012-10-30 21:06 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: 袁冬, Gregory Farnum, ceph-devel



On 10/30/2012 07:59 AM, Gandalf Corvotempesta wrote:
> 2012/10/30 袁冬 <yuandong1222@gmail.com>:
>> Yes, but network (and many other isssues) must be considered.
>
> Obviously
>
>> 3 is suggested.
>
> Any contraindication running mon in the same OSD server?
>

Generally that's considered OK.  ceph-mon doesn't use very much disk or 
CPU or network bandwidth.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-30 13:45           ` Gregory Farnum
@ 2012-10-30 20:32             ` Stefan Priebe
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Priebe @ 2012-10-30 20:32 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Dan Mick, ceph-devel

Am 30.10.2012 14:45, schrieb Gregory Farnum:
>> But there's still the problem of slow random write IOP/s. At least i haven't
>> seen any good benchmarks.
>
> It's not magic — I haven't done extensive testing but I believe people
> see aggregate IOPs of about what you can calculate:
> (number of storage disks * IOPS per disks) / (replication level)
> The journaling bumps that up a little bit for bursts, of course;
> similarly if you're doing it on a brand new RBD image it can be a bit
> slower since you need to create all the objects as well as write data
> to them. You need to architect your storage system to match your
> requirements. If you want to run write-heavy databases on RBD, there
> are people doing that. They're using SSDs and are very pleased with
> its performance. *shrug*

My last test was with 0.49 so i can't talk about 0.52 but as far as i 
know nothing has changed in this case.

I had 6 Dedicated servers running each with 4x Intel 520series SSDs 
running 4 OSDs (one OSD per disk). I had the journal running in tmpfs 
1GB size to be sure it isn't the bottleneck. Replication was set to 2.

Each SSD is capable of doing 30.000 IOP/s random 4k.

But with RBD i wasn't able to get more than 20.000 IOP/s but overall i had:
6 ded. servers * 4 SSDS => 24 OSDs/SSDs * 30.000 IOP/s / Replication 2 
=> 360.000 iop/s theoretical overall performance

But i didn't get more than 20.000 while using 3.6Ghz Xeon CPUs and Dual 
10GBE.

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-30 14:38                 ` 袁冬
@ 2012-10-30 14:59                   ` Gandalf Corvotempesta
  2012-10-30 21:06                     ` Dan Mick
  0 siblings, 1 reply; 22+ messages in thread
From: Gandalf Corvotempesta @ 2012-10-30 14:59 UTC (permalink / raw)
  To: 袁冬; +Cc: Gregory Farnum, Dan Mick, ceph-devel

2012/10/30 袁冬 <yuandong1222@gmail.com>:
> Yes, but network (and many other isssues) must be considered.

Obviously

> 3 is suggested.

Any contraindication running mon in the same OSD server?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-30 14:31               ` Gandalf Corvotempesta
@ 2012-10-30 14:38                 ` 袁冬
  2012-10-30 14:59                   ` Gandalf Corvotempesta
  0 siblings, 1 reply; 22+ messages in thread
From: 袁冬 @ 2012-10-30 14:38 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: Gregory Farnum, Dan Mick, ceph-devel

> In this case, can a single block device (for example a huge virtual
> machine image) be striped across many OSDs to archieve better
> performance in reading?
> an image striped across 3 disks, should get 3*IOPS when reading
Yes, but network (and many other isssues) must be considered.


> Another question: in a standard RGW/RBD infrastructure (no CephFS), I
> have to configure only "mon" and "osd" nodes, right?
Yes.

> How many monitor nodes is suggested?
3 is suggested.


On 30 October 2012 22:31, Gandalf Corvotempesta
<gandalf.corvotempesta@gmail.com> wrote:
> 2012/10/30 袁冬 <yuandong1222@gmail.com>:
>> RGW and libRBD are not the same pool,  you can`t access aRBD volume with
>> RGW. The RADOS treats the RBD volume as just a large object.
>
> Ok, I think to have understood. RADOS store only object on an existent
> filesystem (this is why I have to create a FS to use RADOS/Ceph).
> Now, if that object is accessed by RGW, that object will be a single
> file stored on the FS but if I'm accessing
> with RBD, RBD is masking a very large object, stored on FS, as it is a
> single block device.
> In this case, can a single block device (for example a huge virtual
> machine image) be striped across many OSDs to archieve better
> performance in reading?
> an image striped across 3 disks, should get 3*IOPS when reading
>
>
> Another question: in a standard RGW/RBD infrastructure (no CephFS), I
> have to configure only "mon" and "osd" nodes, right?
> How many monitor nodes is suggested?



-- 
袁冬
Email:yuandong1222@gmail.com
QQ:10200230
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-30 14:17             ` 袁冬
@ 2012-10-30 14:31               ` Gandalf Corvotempesta
  2012-10-30 14:38                 ` 袁冬
  0 siblings, 1 reply; 22+ messages in thread
From: Gandalf Corvotempesta @ 2012-10-30 14:31 UTC (permalink / raw)
  To: 袁冬; +Cc: Gregory Farnum, Dan Mick, ceph-devel

2012/10/30 袁冬 <yuandong1222@gmail.com>:
> RGW and libRBD are not the same pool,  you can`t access aRBD volume with
> RGW. The RADOS treats the RBD volume as just a large object.

Ok, I think to have understood. RADOS store only object on an existent
filesystem (this is why I have to create a FS to use RADOS/Ceph).
Now, if that object is accessed by RGW, that object will be a single
file stored on the FS but if I'm accessing
with RBD, RBD is masking a very large object, stored on FS, as it is a
single block device.
In this case, can a single block device (for example a huge virtual
machine image) be striped across many OSDs to archieve better
performance in reading?
an image striped across 3 disks, should get 3*IOPS when reading


Another question: in a standard RGW/RBD infrastructure (no CephFS), I
have to configure only "mon" and "osd" nodes, right?
How many monitor nodes is suggested?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-30 13:57           ` Gandalf Corvotempesta
@ 2012-10-30 14:17             ` 袁冬
  2012-10-30 14:31               ` Gandalf Corvotempesta
  0 siblings, 1 reply; 22+ messages in thread
From: 袁冬 @ 2012-10-30 14:17 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: Gregory Farnum, Dan Mick, ceph-devel

> Nothing prevents me to offer a service directly based on RADOS API, if
> S3 compatibility is not needed, right ?

Correct, That is librados.


> What I don't understand is how can I access a single file from RGW. If
> LibRBD and RGW are 'gateway' to a RADOS store, i'll have access to a
> block device, not to a single file.
> Should I create a filesystem on the block device before using that with
> RGW?

RGW and libRBD are not the same pool,  you can`t access aRBD volume with
RGW. The RADOS treats the RBD volume as just a large object.

On 30 October 2012 21:57, Gandalf Corvotempesta
<gandalf.corvotempesta@gmail.com> wrote:
>
> 2012/10/30 Gregory Farnum <greg@inktank.com>:
> > Not exactly. RADOS is natively a (powerful) object store. RGW takes S3
> > and Swift REST requests and translates them into RADOS requests,
> > stored in a "custom" format. RBD is a client-side library which takes
> > a logical block device and stripes it over RADOS objects (by default,
> > the first 4MB is one object, the second 4MB are another object, etc).
> > Make sense?
>
> So, a ceph cluster is made from multiple OSDs.
> these OSDs are combined by RADOS that is an object store that will
> stripe over multiple OSDs
>
> This store can be accessed by RGW (for S3 and Swift api compatibility,
> in needed) or directly by a server as a block device with librbd
>
> This should be the architecture:
>
> OSD -> RADOS -> RGW/LibRBD -> Customer/Server
>
> Nothing prevents me to offer a service directly based on RADOS API, if
> S3 compatibility is not needed, right ?
>
> What I don't understand is how can I access a single file from RGW. If
> LibRBD and RGW are 'gateway' to a RADOS store, i'll have access to a
> block device, not to a single file.
> Should I create a filesystem on the block device before using that with
> RGW?
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
袁冬
Email:yuandong1222@gmail.com
QQ:10200230
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-30 13:40         ` Gregory Farnum
@ 2012-10-30 13:57           ` Gandalf Corvotempesta
  2012-10-30 14:17             ` 袁冬
  0 siblings, 1 reply; 22+ messages in thread
From: Gandalf Corvotempesta @ 2012-10-30 13:57 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Dan Mick, ceph-devel

2012/10/30 Gregory Farnum <greg@inktank.com>:
> Not exactly. RADOS is natively a (powerful) object store. RGW takes S3
> and Swift REST requests and translates them into RADOS requests,
> stored in a "custom" format. RBD is a client-side library which takes
> a logical block device and stripes it over RADOS objects (by default,
> the first 4MB is one object, the second 4MB are another object, etc).
> Make sense?

So, a ceph cluster is made from multiple OSDs.
these OSDs are combined by RADOS that is an object store that will
stripe over multiple OSDs

This store can be accessed by RGW (for S3 and Swift api compatibility,
in needed) or directly by a server as a block device with librbd

This should be the architecture:

OSD -> RADOS -> RGW/LibRBD -> Customer/Server

Nothing prevents me to offer a service directly based on RADOS API, if
S3 compatibility is not needed, right ?

What I don't understand is how can I access a single file from RGW. If
LibRBD and RGW are 'gateway' to a RADOS store, i'll have access to a
block device, not to a single file.
Should I create a filesystem on the block device before using that with RGW?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-30 13:38         ` Stefan Priebe - Profihost AG
@ 2012-10-30 13:45           ` Gregory Farnum
  2012-10-30 20:32             ` Stefan Priebe
  0 siblings, 1 reply; 22+ messages in thread
From: Gregory Farnum @ 2012-10-30 13:45 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Dan Mick, ceph-devel

On Tue, Oct 30, 2012 at 2:38 PM, Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
> Am 30.10.2012 14:36, schrieb Gandalf Corvotempesta:
>
>> 2012/10/30 Gregory Farnum <greg@inktank.com>:
>>>
>>> Not a lot of people are publicly discussing their sizes on things like
>>> that, unfortunately. I believe DreamHost is still the most open. They
>>> have an (RGW-based) object storage service which is backed by ~800
>>> OSDs and are currently beta-testing a compute service using RBD, which
>>> you can see described here:
>>> http://www.youtube.com/watch?v=l_8Y988fO44&feature=plcp
>
>
> But there's still the problem of slow random write IOP/s. At least i haven't
> seen any good benchmarks.

It's not magic — I haven't done extensive testing but I believe people
see aggregate IOPs of about what you can calculate:
(number of storage disks * IOPS per disks) / (replication level)
The journaling bumps that up a little bit for bursts, of course;
similarly if you're doing it on a brand new RBD image it can be a bit
slower since you need to create all the objects as well as write data
to them. You need to architect your storage system to match your
requirements. If you want to run write-heavy databases on RBD, there
are people doing that. They're using SSDs and are very pleased with
its performance. *shrug*
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-30 13:36       ` Gandalf Corvotempesta
  2012-10-30 13:38         ` Stefan Priebe - Profihost AG
@ 2012-10-30 13:40         ` Gregory Farnum
  2012-10-30 13:57           ` Gandalf Corvotempesta
  1 sibling, 1 reply; 22+ messages in thread
From: Gregory Farnum @ 2012-10-30 13:40 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: Dan Mick, ceph-devel

On Tue, Oct 30, 2012 at 2:36 PM, Gandalf Corvotempesta
<gandalf.corvotempesta@gmail.com> wrote:
> 2012/10/30 Gregory Farnum <greg@inktank.com>:
>> Not a lot of people are publicly discussing their sizes on things like
>> that, unfortunately. I believe DreamHost is still the most open. They
>> have an (RGW-based) object storage service which is backed by ~800
>> OSDs and are currently beta-testing a compute service using RBD, which
>> you can see described here:
>> http://www.youtube.com/watch?v=l_8Y988fO44&feature=plcp
>
> I'm watching right now. Seems interesting.
> Please let me know if I understand ceph properly:
> RADOS is the block storage.
> RADOS can be accessed through RGW (a REST Gateway) or throgh librbd

Not exactly. RADOS is natively a (powerful) object store. RGW takes S3
and Swift REST requests and translates them into RADOS requests,
stored in a "custom" format. RBD is a client-side library which takes
a logical block device and stripes it over RADOS objects (by default,
the first 4MB is one object, the second 4MB are another object, etc).
Make sense?
-Greg

> In the first case, we will have an object store, in the second case,
> we will have a block device connected directly to a server (like an
> iSCSI block device)
>
> But in the first case, should I create a filesystem on RBD and then
> manage that FS with gateway?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-30 13:36       ` Gandalf Corvotempesta
@ 2012-10-30 13:38         ` Stefan Priebe - Profihost AG
  2012-10-30 13:45           ` Gregory Farnum
  2012-10-30 13:40         ` Gregory Farnum
  1 sibling, 1 reply; 22+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-10-30 13:38 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: Gregory Farnum, Dan Mick, ceph-devel

Am 30.10.2012 14:36, schrieb Gandalf Corvotempesta:
> 2012/10/30 Gregory Farnum <greg@inktank.com>:
>> Not a lot of people are publicly discussing their sizes on things like
>> that, unfortunately. I believe DreamHost is still the most open. They
>> have an (RGW-based) object storage service which is backed by ~800
>> OSDs and are currently beta-testing a compute service using RBD, which
>> you can see described here:
>> http://www.youtube.com/watch?v=l_8Y988fO44&feature=plcp

But there's still the problem of slow random write IOP/s. At least i 
haven't seen any good benchmarks.

Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-30 13:15     ` Gregory Farnum
@ 2012-10-30 13:36       ` Gandalf Corvotempesta
  2012-10-30 13:38         ` Stefan Priebe - Profihost AG
  2012-10-30 13:40         ` Gregory Farnum
  0 siblings, 2 replies; 22+ messages in thread
From: Gandalf Corvotempesta @ 2012-10-30 13:36 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Dan Mick, ceph-devel

2012/10/30 Gregory Farnum <greg@inktank.com>:
> Not a lot of people are publicly discussing their sizes on things like
> that, unfortunately. I believe DreamHost is still the most open. They
> have an (RGW-based) object storage service which is backed by ~800
> OSDs and are currently beta-testing a compute service using RBD, which
> you can see described here:
> http://www.youtube.com/watch?v=l_8Y988fO44&feature=plcp

I'm watching right now. Seems interesting.
Please let me know if I understand ceph properly:
RADOS is the block storage.
RADOS can be accessed through RGW (a REST Gateway) or throgh librbd

In the first case, we will have an object store, in the second case,
we will have a block device connected directly to a server (like an
iSCSI block device)

But in the first case, should I create a filesystem on RBD and then
manage that FS with gateway?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-30 11:35   ` Gandalf Corvotempesta
@ 2012-10-30 13:15     ` Gregory Farnum
  2012-10-30 13:36       ` Gandalf Corvotempesta
  0 siblings, 1 reply; 22+ messages in thread
From: Gregory Farnum @ 2012-10-30 13:15 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: Dan Mick, ceph-devel

Not a lot of people are publicly discussing their sizes on things like
that, unfortunately. I believe DreamHost is still the most open. They
have an (RGW-based) object storage service which is backed by ~800
OSDs and are currently beta-testing a compute service using RBD, which
you can see described here:
http://www.youtube.com/watch?v=l_8Y988fO44&feature=plcp

On Tue, Oct 30, 2012 at 12:35 PM, Gandalf Corvotempesta
<gandalf.corvotempesta@gmail.com> wrote:
> 2012/10/29 Dan Mick <dan.mick@inktank.com>:
>> There are sites using them in production now.
>
> Any docs about infrastructure size, topology, performance and so on?
>
> I'm evaluating many distribuited system and I would like to have some
> feedbacks/usecase)
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-29 20:21 ` Dan Mick
@ 2012-10-30 11:35   ` Gandalf Corvotempesta
  2012-10-30 13:15     ` Gregory Farnum
  0 siblings, 1 reply; 22+ messages in thread
From: Gandalf Corvotempesta @ 2012-10-30 11:35 UTC (permalink / raw)
  To: Dan Mick; +Cc: ceph-devel

2012/10/29 Dan Mick <dan.mick@inktank.com>:
> There are sites using them in production now.

Any docs about infrastructure size, topology, performance and so on?

I'm evaluating many distribuited system and I would like to have some
feedbacks/usecase)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: production ready?
  2012-10-26 21:52 Gandalf Corvotempesta
@ 2012-10-29 20:21 ` Dan Mick
  2012-10-30 11:35   ` Gandalf Corvotempesta
  0 siblings, 1 reply; 22+ messages in thread
From: Dan Mick @ 2012-10-29 20:21 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: ceph-devel



On 10/26/2012 02:52 PM, Gandalf Corvotempesta wrote:
> Hi all,i'm new to ceph.
> Are RBD and REST API production ready?

There are sites using them in production now.

> Do you have any use case to share? we are looking for a distributed
> block storage for an HP C7000 blade with 16 dual processor X5675
> blades with 64/128GB ram each.

Ceph should certainly be able to handle that hardware.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* production ready?
@ 2012-10-26 21:52 Gandalf Corvotempesta
  2012-10-29 20:21 ` Dan Mick
  0 siblings, 1 reply; 22+ messages in thread
From: Gandalf Corvotempesta @ 2012-10-26 21:52 UTC (permalink / raw)
  To: ceph-devel

Hi all,i'm new to ceph.
Are RBD and REST API production ready?
Do you have any use case to share? we are looking for a distributed
block storage for an HP C7000 blade with 16 dual processor X5675
blades with 64/128GB ram each.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-10-30 21:21 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-13 17:14 production ready? Dwight Schauer
     [not found] ` <AANLkTikL-wSkv4uTUM_g5Cs9=k3Q8TNkWXa0KNtnutXJ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-10-14  5:50   ` Ryusuke Konishi
     [not found]     ` <20101014.145018.209367253.ryusuke-sG5X7nlA6pw@public.gmane.org>
2010-10-14 12:25       ` Jérôme Poulin
     [not found]         ` <AANLkTi=hk=xSFLddi5+YOpZNPPDR7rO9Y+zF8N3+Wcdy-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-10-14 12:55           ` Dwight Schauer
     [not found]             ` <AANLkTik7tjKVBrJ_P83sitOVzeGk70AQCJF_CrwG6hYU-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-10-18  2:57               ` Ryusuke Konishi
2012-10-26 21:52 Gandalf Corvotempesta
2012-10-29 20:21 ` Dan Mick
2012-10-30 11:35   ` Gandalf Corvotempesta
2012-10-30 13:15     ` Gregory Farnum
2012-10-30 13:36       ` Gandalf Corvotempesta
2012-10-30 13:38         ` Stefan Priebe - Profihost AG
2012-10-30 13:45           ` Gregory Farnum
2012-10-30 20:32             ` Stefan Priebe
2012-10-30 13:40         ` Gregory Farnum
2012-10-30 13:57           ` Gandalf Corvotempesta
2012-10-30 14:17             ` 袁冬
2012-10-30 14:31               ` Gandalf Corvotempesta
2012-10-30 14:38                 ` 袁冬
2012-10-30 14:59                   ` Gandalf Corvotempesta
2012-10-30 21:06                     ` Dan Mick
2012-10-30 21:17                       ` Gandalf Corvotempesta
2012-10-30 21:21                         ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.