* Ceph on just two nodes being clients - reasonable?
@ 2011-01-19 10:33 Tomasz Chmielewski
2011-01-19 11:30 ` DongJin Lee
2011-01-19 11:41 ` Wido den Hollander
0 siblings, 2 replies; 11+ messages in thread
From: Tomasz Chmielewski @ 2011-01-19 10:33 UTC (permalink / raw)
To: ceph-devel
Is it reasonable to set up Ceph on two nodes, which are Ceph clients at
the same time?
Say, we have two machines:
ceph1 -- ceph2
On each of them, Ceph filesystem is mounted in /shared, which is used by
services like a webserver or a mailserver.
Is it a reasonable approach?
--
Tomasz Chmielewski
http://wpkg.org
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable?
2011-01-19 10:33 Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski
@ 2011-01-19 11:30 ` DongJin Lee
2011-01-19 11:41 ` Wido den Hollander
1 sibling, 0 replies; 11+ messages in thread
From: DongJin Lee @ 2011-01-19 11:30 UTC (permalink / raw)
To: Tomasz Chmielewski; +Cc: ceph-devel
On Wed, Jan 19, 2011 at 11:33 PM, Tomasz Chmielewski <mangoo@wpkg.org> wrote:
> Is it reasonable to set up Ceph on two nodes, which are Ceph clients at the
> same time?
>
>
> Say, we have two machines:
>
> ceph1 -- ceph2
>
>
> On each of them, Ceph filesystem is mounted in /shared, which is used by
> services like a webserver or a mailserver.
>
> Is it a reasonable approach?
Somehow I could not really run a reliable benchmarking (freezing) when
the ceph-client was sitting on the same machine.
It might've fixed now, but maybe you also need good machines too!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable?
2011-01-19 10:33 Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski
2011-01-19 11:30 ` DongJin Lee
@ 2011-01-19 11:41 ` Wido den Hollander
2011-01-19 11:55 ` would you recommend me a solution to store xen-imgfile Longguang Yue
2011-01-19 12:14 ` Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski
1 sibling, 2 replies; 11+ messages in thread
From: Wido den Hollander @ 2011-01-19 11:41 UTC (permalink / raw)
To: Tomasz Chmielewski; +Cc: ceph-devel
Hi Thomas,
I think the answer is Yes and No on this question, the devs might have
another approach for your situation.
If you would do this, you would have a MON, MDS and OSD on every server,
in theory that would work. Mounting would be done by connecting to on of
the MON's (doesn't matter which one).
But Ceph requires, well, advises a odd number of monitors (Source:
http://ceph.newdream.net/wiki/Designing_a_cluster )
So you would require a third node which is running your third monitor to
keep track of both nodes.
My advise, for two nodes, use something like DRBD in Primary <> Primary
and use a cluster filesystem like OCFS2.
Wido
On Wed, 2011-01-19 at 11:33 +0100, Tomasz Chmielewski wrote:
> Is it reasonable to set up Ceph on two nodes, which are Ceph clients at
> the same time?
>
>
> Say, we have two machines:
>
> ceph1 -- ceph2
>
>
> On each of them, Ceph filesystem is mounted in /shared, which is used by
> services like a webserver or a mailserver.
>
> Is it a reasonable approach?
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* would you recommend me a solution to store xen-imgfile
2011-01-19 11:41 ` Wido den Hollander
@ 2011-01-19 11:55 ` Longguang Yue
2011-01-19 17:06 ` Sage Weil
2011-01-19 12:14 ` Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski
1 sibling, 1 reply; 11+ messages in thread
From: Longguang Yue @ 2011-01-19 11:55 UTC (permalink / raw)
To: ceph-devel, ceph-devel-owner; +Cc: Wido den Hollander, Tomasz Chmielewski
would you recommend me a solution to store xen-imgfile
1. stability
2. throughout
3. redundancy
If ceph suitable for storing xen-imgfile??
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable?
2011-01-19 11:41 ` Wido den Hollander
2011-01-19 11:55 ` would you recommend me a solution to store xen-imgfile Longguang Yue
@ 2011-01-19 12:14 ` Tomasz Chmielewski
2011-01-19 15:57 ` Gregory Farnum
1 sibling, 1 reply; 11+ messages in thread
From: Tomasz Chmielewski @ 2011-01-19 12:14 UTC (permalink / raw)
To: Wido den Hollander; +Cc: ceph-devel
On 19.01.2011 12:41, Wido den Hollander wrote:
> Hi Thomas,
>
> I think the answer is Yes and No on this question, the devs might have
> another approach for your situation.
>
> If you would do this, you would have a MON, MDS and OSD on every server,
> in theory that would work. Mounting would be done by connecting to on of
> the MON's (doesn't matter which one).
>
> But Ceph requires, well, advises a odd number of monitors (Source:
> http://ceph.newdream.net/wiki/Designing_a_cluster )
>
> So you would require a third node which is running your third monitor to
> keep track of both nodes.
>
> My advise, for two nodes, use something like DRBD in Primary<> Primary
> and use a cluster filesystem like OCFS2.
Currently, I'm running glusterfs in such a scenario (two servers, each
being also clients), but I wanted to give ceph a try (glusterfs has some
performance issues with lots of small files), also because of its nice
features (snapshots, rbd etc.).
--
Tomasz Chmielewski
http://wpkg.org
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable?
2011-01-19 12:14 ` Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski
@ 2011-01-19 15:57 ` Gregory Farnum
2011-01-19 16:21 ` Tomasz Chmielewski
2011-01-19 17:55 ` Colin McCabe
0 siblings, 2 replies; 11+ messages in thread
From: Gregory Farnum @ 2011-01-19 15:57 UTC (permalink / raw)
To: Tomasz Chmielewski; +Cc: Wido den Hollander, ceph-devel
On Wed, Jan 19, 2011 at 4:14 AM, Tomasz Chmielewski <mangoo@wpkg.org> wrote:
> On 19.01.2011 12:41, Wido den Hollander wrote:
>>
>> Hi Thomas,
>>
>> I think the answer is Yes and No on this question, the devs might have
>> another approach for your situation.
>>
>> If you would do this, you would have a MON, MDS and OSD on every server,
>> in theory that would work. Mounting would be done by connecting to on of
>> the MON's (doesn't matter which one).
>>
>> But Ceph requires, well, advises a odd number of monitors (Source:
>> http://ceph.newdream.net/wiki/Designing_a_cluster )
>>
>> So you would require a third node which is running your third monitor to
>> keep track of both nodes.
>>
>> My advise, for two nodes, use something like DRBD in Primary<> Primary
>> and use a cluster filesystem like OCFS2.
>
> Currently, I'm running glusterfs in such a scenario (two servers, each being
> also clients), but I wanted to give ceph a try (glusterfs has some
> performance issues with lots of small files), also because of its nice
> features (snapshots, rbd etc.).
Rather than running 3 monitors you could just put a monitor on one of
the machines -- your cluster will go down if it fails, but in a 2-node
system it's not like resilience from one-node failure would be very
helpful anyway.
However, there is a serious issue with running clients and servers on
one machine, which may or may not be a problem depending on your use
case: Deadlock becomes a significant possibility. This isn't a problem
we've come up with a good solution for, unfortunately, but imagine
you're writing a lot of files to Ceph. Ceph dutifully writes them and
the kernel dutifully caches them. You also have a lot of write
activity so the Ceph kernel client is doing local caching. Then the
kernel comes along and says "I'm low on memory! Flush stuff to disk!"
and the kernel client tries to flush it out...which involves creating
another copy of the data in memory on the same machine. Uh-oh!
Now if you use the FUSE client this won't be an issue, but your
performance also won't be so good. :/
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable?
2011-01-19 15:57 ` Gregory Farnum
@ 2011-01-19 16:21 ` Tomasz Chmielewski
2011-01-19 17:55 ` Colin McCabe
1 sibling, 0 replies; 11+ messages in thread
From: Tomasz Chmielewski @ 2011-01-19 16:21 UTC (permalink / raw)
To: Gregory Farnum; +Cc: Wido den Hollander, ceph-devel
On 19.01.2011 16:57, Gregory Farnum wrote:
>> Currently, I'm running glusterfs in such a scenario (two servers, each being
>> also clients), but I wanted to give ceph a try (glusterfs has some
>> performance issues with lots of small files), also because of its nice
>> features (snapshots, rbd etc.).
> Rather than running 3 monitors you could just put a monitor on one of
> the machines -- your cluster will go down if it fails, but in a 2-node
> system it's not like resilience from one-node failure would be very
> helpful anyway.
OK, I could imagine starting the monitor on just one node i.e. with the
help of heartbeat - so if the node with the monitor goes down, heartbeat
starts the monitor process on the other machine.
> However, there is a serious issue with running clients and servers on
> one machine, which may or may not be a problem depending on your use
> case: Deadlock becomes a significant possibility.
Sounds like the "freezes" issue mentioned by Dong Jin Lee?
> This isn't a problem
> we've come up with a good solution for, unfortunately, but imagine
> you're writing a lot of files to Ceph. Ceph dutifully writes them and
> the kernel dutifully caches them. You also have a lot of write
> activity so the Ceph kernel client is doing local caching. Then the
> kernel comes along and says "I'm low on memory! Flush stuff to disk!"
> and the kernel client tries to flush it out...which involves creating
> another copy of the data in memory on the same machine. Uh-oh!
Uh-oh, it doesn't sound encouraging, and will likely happen sooner or later.
Would some sort of zero-copy help here? But perhaps it's not that easy
to solve, otherwise, we wouldn't be discussing it here.
I think swapping over NFS (or, iSCSI) has a similar problem ("need to
write, but the network buffer is full, so we can't write over network ->
deadlock"), and there were some patches floating around some years ago
to solve it. Not sure what's the state of it and how similar it is to
Ceph though.
Last I checked, 2 < 3, so having a budget HA solution which just needed
2 servers instead of 3 would be a great thing to have! ;)
--
Tomasz Chmielewski
http://wpkg.org
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: would you recommend me a solution to store xen-imgfile
2011-01-19 11:55 ` would you recommend me a solution to store xen-imgfile Longguang Yue
@ 2011-01-19 17:06 ` Sage Weil
0 siblings, 0 replies; 11+ messages in thread
From: Sage Weil @ 2011-01-19 17:06 UTC (permalink / raw)
To: Longguang Yue; +Cc: ceph-devel, Wido den Hollander, Tomasz Chmielewski
Hi Longguang,
On Wed, 19 Jan 2011, Longguang Yue wrote:
> would you recommend me a solution to store xen-imgfile
> 1. stability
> 2. throughout
> 3. redundancy
> If ceph suitable for storing xen-imgfile??
I would suggest using the kernel RBD driver, now present in 2.6.37. See
http://ceph.newdream.net/wiki/Rbd
sage
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable?
2011-01-19 15:57 ` Gregory Farnum
2011-01-19 16:21 ` Tomasz Chmielewski
@ 2011-01-19 17:55 ` Colin McCabe
2011-01-19 20:03 ` Tommi Virtanen
1 sibling, 1 reply; 11+ messages in thread
From: Colin McCabe @ 2011-01-19 17:55 UTC (permalink / raw)
To: Gregory Farnum; +Cc: Tomasz Chmielewski, Wido den Hollander, ceph-devel
On Wed, Jan 19, 2011 at 7:57 AM, Gregory Farnum <gregf@hq.newdream.net> wrote:
> However, there is a serious issue with running clients and servers on
> one machine, which may or may not be a problem depending on your use
> case: Deadlock becomes a significant possibility. This isn't a problem
> we've come up with a good solution for, unfortunately, but imagine
> you're writing a lot of files to Ceph. Ceph dutifully writes them and
> the kernel dutifully caches them. You also have a lot of write
> activity so the Ceph kernel client is doing local caching. Then the
> kernel comes along and says "I'm low on memory! Flush stuff to disk!"
> and the kernel client tries to flush it out...which involves creating
> another copy of the data in memory on the same machine. Uh-oh!
> Now if you use the FUSE client this won't be an issue, but your
> performance also won't be so good. :/
If you knew what the maximum memory consumption for the daemons would
be, you could use mlock to lock all those pages into memory (make them
unswappable.) Then you could use rlimit to ensure that if the daemon
ever tried to allocate more than that, it would be killed.
That would prevent the scenario you outlined above where there is not
enough memory to flush the page cache. Of course, to do this, we would
need to reduce memory consumption and make it deterministic for this
to be feasible.
cheers,
Colin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable?
2011-01-19 17:55 ` Colin McCabe
@ 2011-01-19 20:03 ` Tommi Virtanen
2011-01-19 21:09 ` Colin McCabe
0 siblings, 1 reply; 11+ messages in thread
From: Tommi Virtanen @ 2011-01-19 20:03 UTC (permalink / raw)
To: Colin McCabe
Cc: Gregory Farnum, Tomasz Chmielewski, Wido den Hollander, ceph-devel
On Wed, Jan 19, 2011 at 09:55:27AM -0800, Colin McCabe wrote:
> If you knew what the maximum memory consumption for the daemons would
> be, you could use mlock to lock all those pages into memory (make them
> unswappable.) Then you could use rlimit to ensure that if the daemon
> ever tried to allocate more than that, it would be killed.
The classic nfs loopback mount deadlock is less about how much memory
the daemons are grabbing via malloc etc, and more about the buffer
cache management in kernel.
With a "loopback ceph", pressure from activity on the kernel ceph
client mountpoint might interact badly with the buffer cache the OSD
needs to work well, whether the OSD userspace tries to limit itself or
not.
It's one of those "it'll work until you have a bad day" things.
http://www.webservertalk.com/archive242-2007-10-2051163.html
https://bugzilla.redhat.com/show_bug.cgi?id=489889
http://lkml.org/lkml/2006/12/14/448
http://docs.google.com/viewer?a=v&q=cache:ONtIKJFSC7QJ:https://tao.truststc.org/Members/hweather/advanced_storage/Public%2520resources/network/nfs_user+nfs+loopback+deadlock+linux&hl=en&gl=us&pid=bl&srcid=ADGEESgpaVYYNoh2pmvPVQ9I_bpLLcoF3GJIMKavomIHNgTb-cbii6RVtWg28poJKdHBqQgKGXzVA2NOsC25FtWMP3yywTfNkX9N26IrKVIcVA9eRz6ZGBx1_Ur0JerUrfBQlPcmcBBz&sig=AHIEtbSjGX_hCVny345iFSq7WKBvxNZmIw
(slide 5)
--
:(){ :|:&};:
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable?
2011-01-19 20:03 ` Tommi Virtanen
@ 2011-01-19 21:09 ` Colin McCabe
0 siblings, 0 replies; 11+ messages in thread
From: Colin McCabe @ 2011-01-19 21:09 UTC (permalink / raw)
To: Tommi Virtanen
Cc: Gregory Farnum, Tomasz Chmielewski, Wido den Hollander, ceph-devel
On Wed, Jan 19, 2011 at 12:03 PM, Tommi Virtanen
<tommi.virtanen@dreamhost.com> wrote:
> On Wed, Jan 19, 2011 at 09:55:27AM -0800, Colin McCabe wrote:
>> If you knew what the maximum memory consumption for the daemons would
>> be, you could use mlock to lock all those pages into memory (make them
>> unswappable.) Then you could use rlimit to ensure that if the daemon
>> ever tried to allocate more than that, it would be killed.
>
> The classic nfs loopback mount deadlock is less about how much memory
> the daemons are grabbing via malloc etc, and more about the buffer
> cache management in kernel.
My understanding is that nfsd tries to allocate memory, which turns
out to be impossible because the page cache is occupying that memory,
and requires nfsd to drain.
I guess the question you are asking is whether nfsd just doing I/O
requires kernel memory that might not be available. I'm not entirely
sure about the answer to that. Unfortunately, none of those links has
any information on the subject (I had high hopes for the lkml one, but
it was about an unrelated race in NFS).
Colin
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-01-19 21:10 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-19 10:33 Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski
2011-01-19 11:30 ` DongJin Lee
2011-01-19 11:41 ` Wido den Hollander
2011-01-19 11:55 ` would you recommend me a solution to store xen-imgfile Longguang Yue
2011-01-19 17:06 ` Sage Weil
2011-01-19 12:14 ` Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski
2011-01-19 15:57 ` Gregory Farnum
2011-01-19 16:21 ` Tomasz Chmielewski
2011-01-19 17:55 ` Colin McCabe
2011-01-19 20:03 ` Tommi Virtanen
2011-01-19 21:09 ` Colin McCabe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.