All of lore.kernel.org
 help / color / mirror / Atom feed
* how can I achieve HA with ceph?
@ 2011-12-18 19:26 Karoly Horvath
  2011-12-19 17:40 ` Tommi Virtanen
  2011-12-19 22:50 ` Gregory Farnum
  0 siblings, 2 replies; 13+ messages in thread
From: Karoly Horvath @ 2011-12-18 19:26 UTC (permalink / raw)
  To: ceph-devel

Hi Guys,

two questions:

first one is short:
The documentation states for all the daemons that they have to be an
odd number to work correctly.
But what happens if one of the nodes is down? Then, by definition
there will be an even number of daemons.
Can the system tolerate this failure? If not, do I have to automate
the process of quickly bringing up a new node to achieve HA?

the second one:
I have a simple configuration:

ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83)
xxx.xxx.xxx.31 alpha (mds, mon, osd)
xxx.xxx.xxx.33 beta  (mds, mon, osd)
xxx.xxx.xxx.35 gamma (     mon, osd)
ceph FS is mounted with listing the two mds-es.
I set 'data' and 'metadata' to 2, then tested with 3.

I've read the documentation and it suggests this should be enough to
achieve High Availability.
The data is replicated on all the osd-s (3), there is at least 1 mds
up all the time...yet:

Each time I remove the power plug from the primary mds node's host,
the system goes down and I cannot do a simple `ls`.
I can replicate this problem and send you any logfiles or ceph -w
outputs you need. Let me know what you need.
Here is an example session: http://pastebin.com/R4MgdhUy

I once saw the standby mds to wake up and then the FS worked but that
was after 20 minutes, which is way too long for a HA scenario.

There is hardly any data on the FS at the moment (400MB, lol..), and
hardly any writes...

I'm willing to sacrifice (a lot of) performance to achieve high availability.
Let me know if there are configuration settings to achieve this.

Thanks.

-- 
Karoly Horvath
rhswdev@gmail.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: how can I achieve HA with ceph?
  2011-12-18 19:26 how can I achieve HA with ceph? Karoly Horvath
@ 2011-12-19 17:40 ` Tommi Virtanen
  2011-12-19 22:50 ` Gregory Farnum
  1 sibling, 0 replies; 13+ messages in thread
From: Tommi Virtanen @ 2011-12-19 17:40 UTC (permalink / raw)
  To: Karoly Horvath; +Cc: ceph-devel

On Sun, Dec 18, 2011 at 11:26, Karoly Horvath <rhswdev@gmail.com> wrote:
> The documentation states for all the daemons that they have to be an
> odd number to work correctly.
> But what happens if one of the nodes is down? Then, by definition
> there will be an even number of daemons.
> Can the system tolerate this failure? If not, do I have to automate
> the process of quickly bringing up a new node to achieve HA?

Not all, just the ceph-mon daemons should be an odd number. Monitors
know how many monitors there are, even when one of them is down. As
long as a majority of them is available, they can operate normally.
With 3, you can temporarily lose 1 and keep operating. With 5, you can
lose 2. With 7, you can lose 3.

If you permanently lose one of the machines running ceph-mon, see
http://ceph.newdream.net/docs/latest/ops/manage/grow/mon/ for how to
remove it, and add a new daemon elsewhere.

Everything else is able to deal with just the number that you want to
run; naturally, just 1 doesn't give you any HA.

With the mds, we currently recommend running only 1 in active mode,
rest in standby.

> ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83)
> xxx.xxx.xxx.31 alpha (mds, mon, osd)
> xxx.xxx.xxx.33 beta  (mds, mon, osd)
> xxx.xxx.xxx.35 gamma (     mon, osd)
> ceph FS is mounted with listing the two mds-es.
> I set 'data' and 'metadata' to 2, then tested with 3.
>
> I've read the documentation and it suggests this should be enough to
> achieve High Availability.
> The data is replicated on all the osd-s (3), there is at least 1 mds
> up all the time...yet:
>
> Each time I remove the power plug from the primary mds node's host,
> the system goes down and I cannot do a simple `ls`.
> I can replicate this problem and send you any logfiles or ceph -w
> outputs you need. Let me know what you need.
> Here is an example session: http://pastebin.com/R4MgdhUy
>
> I once saw the standby mds to wake up and then the FS worked but that
> was after 20 minutes, which is way too long for a HA scenario.

> I'm willing to sacrifice (a lot of) performance to achieve high availability.
> Let me know if there are configuration settings to achieve this.

The paste says the second mds is a standby, that's good.

There's a timeout before the standby becomes active. That timeout
might be too long to suit your needs. Hopefully someone else from the
team who's actually worked on the mds will confirm this, but it looks
like the relevant config setting is mds_beacon_grace (default 15, the
unit seems to be seconds). I'm not sure what's going on there.

See if you can speed up the standby becoming active with "ceph mds
fail beta" (or whatever node you took down). If that makes it happen
fast, then we can figure out the timers; if that doesn't make the
failover happen, then there's something wrong in the setup.

Most of the QA is currently focusing on rados, radosgw, and rbd, so
we're not actively running these kinds of tests on the mds component
right now.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: how can I achieve HA with ceph?
  2011-12-18 19:26 how can I achieve HA with ceph? Karoly Horvath
  2011-12-19 17:40 ` Tommi Virtanen
@ 2011-12-19 22:50 ` Gregory Farnum
  2011-12-20 18:07   ` Karoly Horvath
  1 sibling, 1 reply; 13+ messages in thread
From: Gregory Farnum @ 2011-12-19 22:50 UTC (permalink / raw)
  To: Karoly Horvath; +Cc: ceph-devel

On Sun, Dec 18, 2011 at 11:26 AM, Karoly Horvath <rhswdev@gmail.com> wrote:
> Hi Guys,
>
> two questions:
>
> first one is short:
> The documentation states for all the daemons that they have to be an
> odd number to work correctly.
> But what happens if one of the nodes is down? Then, by definition
> there will be an even number of daemons.
> Can the system tolerate this failure? If not, do I have to automate
> the process of quickly bringing up a new node to achieve HA?
>
> the second one:
> I have a simple configuration:
>
> ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83)
> xxx.xxx.xxx.31 alpha (mds, mon, osd)
> xxx.xxx.xxx.33 beta  (mds, mon, osd)
> xxx.xxx.xxx.35 gamma (     mon, osd)
> ceph FS is mounted with listing the two mds-es.
> I set 'data' and 'metadata' to 2, then tested with 3.
>
> I've read the documentation and it suggests this should be enough to
> achieve High Availability.
> The data is replicated on all the osd-s (3), there is at least 1 mds
> up all the time...yet:
>
> Each time I remove the power plug from the primary mds node's host,
> the system goes down and I cannot do a simple `ls`.
> I can replicate this problem and send you any logfiles or ceph -w
> outputs you need. Let me know what you need.
> Here is an example session: http://pastebin.com/R4MgdhUy
>
> I once saw the standby mds to wake up and then the FS worked but that
> was after 20 minutes, which is way too long for a HA scenario.
>
> There is hardly any data on the FS at the moment (400MB, lol..), and
> hardly any writes...
>
> I'm willing to sacrifice (a lot of) performance to achieve high availability.
> Let me know if there are configuration settings to achieve this.
TV's right. Specifically regarding the MDS:
As TV said, the MDS should time out and get replaced within 30 seconds
(this is controlled by the "mds beacon grace" setting). It is a
failover procedure, not shared masters or something, but from my
experience it takes on the order of 10 seconds to complete once
failure is detected. 20 minutes is completely wrong. If we're not
already talking elsewhere, then I'd like it if you could enable MDS
logging and reproduce this and post the logs somewhere so I can check
out what's going on.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: how can I achieve HA with ceph?
  2011-12-19 22:50 ` Gregory Farnum
@ 2011-12-20 18:07   ` Karoly Horvath
  2011-12-20 22:50     ` Gregory Farnum
  0 siblings, 1 reply; 13+ messages in thread
From: Karoly Horvath @ 2011-12-20 18:07 UTC (permalink / raw)
  To: ceph-devel

Hi,
all test were made with kill -9, killing the active mds (and sometimes
other processes).I waited a couple of minutes between each test to
make sure that the cluster reached a stable state.(btw: how can I
check this programmatically?)
#  KILLED           result1. mds @ beta       OK2. mds @ alpha
OK3. mds+osd @ beta   FAILED                    switch ok
{0=alpha=up:active}, but FS not readable                    FS
permanently freezed                    rebooted the whole cluster4.
mds+mon @ alpha  OK (32 sec)                    rebooted the whole
cluster5. mds+osd @ beta   OK (25 sec)                    rebooted the
whole cluster6. mds+osd @ beta   OK (24 sec)7. mds+osd @ alpha  OK (30
sec)8. mds+mon+osd @ beta  OK (27 sec)9. power unplug @ alpha FAILED
                 stuck in {0=beta=up:replay} for a long time
         finally it's switching to {0=alpha=up:active}, but FS not
readable                    FS permanently freezed, even when bringing
up alpha...
I uploaded test results here:
http://www.4shared.com/file/5nXMw_sM/cephlogs_mds_test.html?
If you need any other configuration options changed, let me know..logs
were created with:

mkdir -p $LOGDIRtail -f /var/log/ceph/mds.*.log >$LOGDIR/mds.log
&p1=$!tail -f /var/log/ceph/mon.*.log >$LOGDIR/mon.log &p2=$!tail -f
/var/log/ceph/osd.*.log >$LOGDIR/osd.log &p3=$!ceph -w
>$LOGDIR/ceph.log &p4=$!
read linekill $p1 $p2 $p3 $p4
# anonimize ip addressesfor f in $LOGDIR/*.log; do    sed -r -i
's/[0-9]+\.[0-9]+\.[0-9]+\.([0-9]+)/xxx.xxx.xxx.\1/g' $fdone
-- Karoly Horvathrhswdev@gmail.com

On Mon, Dec 19, 2011 at 10:50 PM, Gregory Farnum
<gregory.farnum@dreamhost.com> wrote:
> On Sun, Dec 18, 2011 at 11:26 AM, Karoly Horvath <rhswdev@gmail.com> wrote:
>> Hi Guys,
>>
>> two questions:
>>
>> first one is short:
>> The documentation states for all the daemons that they have to be an
>> odd number to work correctly.
>> But what happens if one of the nodes is down? Then, by definition
>> there will be an even number of daemons.
>> Can the system tolerate this failure? If not, do I have to automate
>> the process of quickly bringing up a new node to achieve HA?
>>
>> the second one:
>> I have a simple configuration:
>>
>> ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83)
>> xxx.xxx.xxx.31 alpha (mds, mon, osd)
>> xxx.xxx.xxx.33 beta  (mds, mon, osd)
>> xxx.xxx.xxx.35 gamma (     mon, osd)
>> ceph FS is mounted with listing the two mds-es.
>> I set 'data' and 'metadata' to 2, then tested with 3.
>>
>> I've read the documentation and it suggests this should be enough to
>> achieve High Availability.
>> The data is replicated on all the osd-s (3), there is at least 1 mds
>> up all the time...yet:
>>
>> Each time I remove the power plug from the primary mds node's host,
>> the system goes down and I cannot do a simple `ls`.
>> I can replicate this problem and send you any logfiles or ceph -w
>> outputs you need. Let me know what you need.
>> Here is an example session: http://pastebin.com/R4MgdhUy
>>
>> I once saw the standby mds to wake up and then the FS worked but that
>> was after 20 minutes, which is way too long for a HA scenario.
>>
>> There is hardly any data on the FS at the moment (400MB, lol..), and
>> hardly any writes...
>>
>> I'm willing to sacrifice (a lot of) performance to achieve high availability.
>> Let me know if there are configuration settings to achieve this.
> TV's right. Specifically regarding the MDS:
> As TV said, the MDS should time out and get replaced within 30 seconds
> (this is controlled by the "mds beacon grace" setting). It is a
> failover procedure, not shared masters or something, but from my
> experience it takes on the order of 10 seconds to complete once
> failure is detected. 20 minutes is completely wrong. If we're not
> already talking elsewhere, then I'd like it if you could enable MDS
> logging and reproduce this and post the logs somewhere so I can check
> out what's going on.
> -Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: how can I achieve HA with ceph?
  2011-12-20 18:07   ` Karoly Horvath
@ 2011-12-20 22:50     ` Gregory Farnum
  2011-12-20 23:45       ` Karoly Horvath
       [not found]       ` <CA+o_KfsG=_TvQVNJ1HcUcTvntN5gqKCC3eXFhYwZVj3_fF4wRg@mail.gmail.com>
  0 siblings, 2 replies; 13+ messages in thread
From: Gregory Farnum @ 2011-12-20 22:50 UTC (permalink / raw)
  To: Karoly Horvath; +Cc: ceph-devel

On Tue, Dec 20, 2011 at 10:07 AM, Karoly Horvath <rhswdev@gmail.com> wrote:
> Hi,
> all test were made with kill -9, killing the active mds (and sometimes
> other processes).I waited a couple of minutes between each test to
> make sure that the cluster reached a stable state.(btw: how can I
> check this programmatically?)
You can run "ceph health", which has only a few different values you
can look for. :)

> #  KILLED           result1. mds @ beta       OK2. mds @ alpha
> OK3. mds+osd @ beta   FAILED                    switch ok
> {0=alpha=up:active}, but FS not readable                    FS
> permanently freezed                    rebooted the whole cluster4.
> mds+mon @ alpha  OK (32 sec)                    rebooted the whole
> cluster5. mds+osd @ beta   OK (25 sec)                    rebooted the
> whole cluster6. mds+osd @ beta   OK (24 sec)7. mds+osd @ alpha  OK (30
> sec)8. mds+mon+osd @ beta  OK (27 sec)9. power unplug @ alpha FAILED
>                  stuck in {0=beta=up:replay} for a long time
>          finally it's switching to {0=alpha=up:active}, but FS not
> readable                    FS permanently freezed, even when bringing
> up alpha...
Your formatting got pretty mangled here, and I'm still not sure what's
going on. Did you restart all the daemons between each kill attempt?
(for instance, it looks like '1' is to kill mds.beta; '2' is to kill
mds.alpha, and then '3' is to kill mds.beta — but you already did
that)

> I uploaded test results here:
> http://www.4shared.com/file/5nXMw_sM/cephlogs_mds_test.html?
> If you need any other configuration options changed, let me know
Sorry, I should have been clearer when I said turn on mds logging. Add
"debug mds = 20" and "debug ms = 1" lines to your ceph.conf MDS
sections. This will spit out a lot more information about what's going
on internally, which will help us diagnose this. :)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: how can I achieve HA with ceph?
  2011-12-20 22:50     ` Gregory Farnum
@ 2011-12-20 23:45       ` Karoly Horvath
       [not found]       ` <CA+o_KfsG=_TvQVNJ1HcUcTvntN5gqKCC3eXFhYwZVj3_fF4wRg@mail.gmail.com>
  1 sibling, 0 replies; 13+ messages in thread
From: Karoly Horvath @ 2011-12-20 23:45 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

Sorry about the formatting, here it is again, I hope it's readable now.

for each test it shows which services I killed on which node. after
each tests I restored all services.

1. mds @ beta       OK

2. mds @ alpha      OK


3. mds+osd @ beta  FAILED
   switch ok {0=alpha=up:active}, but FS not readable
   FS permanently freezed

rebooted the whole cluster

4. mds+mon @ alpha  OK (32 sec)


rebooted the whole cluster

5. mds+osd @ beta   OK (25 sec)

rebooted the whole cluster

6. mds+osd @ beta   OK (24 sec)

7. mds+osd @ alpha  OK (30 sec)

8. mds+mon+osd @ beta  OK (27 sec)

9. power unplug @ alpha FAILED
   stuck in {0=beta=up:replay} for a long time
   finally it's switching to {0=alpha=up:active}, but FS not readable
   FS permanently freezed, even when bringing up alpha...

I included all the tests to show what worked and what didn't.
note that the mds+osd kill worked most of the time but there was also
a problematic test.
also note that the power unplug test FAILED all the time, I included
only one test.


On Tue, Dec 20, 2011 at 10:50 PM, Gregory Farnum
<gregory.farnum@dreamhost.com> wrote:
> On Tue, Dec 20, 2011 at 10:07 AM, Karoly Horvath <rhswdev@gmail.com> wrote:
>> Hi,
>> all test were made with kill -9, killing the active mds (and sometimes
>> other processes).I waited a couple of minutes between each test to
>> make sure that the cluster reached a stable state.(btw: how can I
>> check this programmatically?)
> You can run "ceph health", which has only a few different values you
> can look for. :)
>
>> #  KILLED           result1. mds @ beta       OK2. mds @ alpha
>> OK3. mds+osd @ beta   FAILED                    switch ok
>> {0=alpha=up:active}, but FS not readable                    FS
>> permanently freezed                    rebooted the whole cluster4.
>> mds+mon @ alpha  OK (32 sec)                    rebooted the whole
>> cluster5. mds+osd @ beta   OK (25 sec)                    rebooted the
>> whole cluster6. mds+osd @ beta   OK (24 sec)7. mds+osd @ alpha  OK (30
>> sec)8. mds+mon+osd @ beta  OK (27 sec)9. power unplug @ alpha FAILED
>>                  stuck in {0=beta=up:replay} for a long time
>>          finally it's switching to {0=alpha=up:active}, but FS not
>> readable                    FS permanently freezed, even when bringing
>> up alpha...
> Your formatting got pretty mangled here, and I'm still not sure what's
> going on. Did you restart all the daemons between each kill attempt?
> (for instance, it looks like '1' is to kill mds.beta; '2' is to kill
> mds.alpha, and then '3' is to kill mds.beta — but you already did
> that)
>
>> I uploaded test results here:
>> http://www.4shared.com/file/5nXMw_sM/cephlogs_mds_test.html?
>> If you need any other configuration options changed, let me know
> Sorry, I should have been clearer when I said turn on mds logging. Add
> "debug mds = 20" and "debug ms = 1" lines to your ceph.conf MDS
> sections. This will spit out a lot more information about what's going
> on internally, which will help us diagnose this. :)

I had those lines, the log seemed to be quite verbose... let me know
if it didn't work.

-- 
Karoly Horvath
rhswdev@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: how can I achieve HA with ceph?
       [not found]       ` <CA+o_KfsG=_TvQVNJ1HcUcTvntN5gqKCC3eXFhYwZVj3_fF4wRg@mail.gmail.com>
@ 2011-12-21  0:03         ` Gregory Farnum
       [not found]           ` <CA+o_KfuigaFGYvHVOk_W-gpTQryheYfc4=9v5nxemUyqhH-GQw@mail.gmail.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Gregory Farnum @ 2011-12-21  0:03 UTC (permalink / raw)
  To: Karoly Horvath; +Cc: ceph-devel

On Tue, Dec 20, 2011 at 3:42 PM, Karoly Horvath <rhswdev@gmail.com> wrote:
> Sorry about the formatting, here it is again, I hope it's readable now
>
> for each test it shows which services I killed on which node. after the test
> I restored all services.
>
> 1. mds @ beta       OK
>
> 2. mds @ alpha      OK
>
>
> 3. mds+osd @ beta  FAILED
>    switch ok {0=alpha=up:active}, but FS not readable
>    FS permanently freezed
>
> rebooted the whole cluster
>
> 4. mds+mon @ alpha  OK (32 sec)
>
>
> rebooted the whole cluster
>
> 5. mds+osd @ beta   OK (25 sec)
>
> rebooted the whole cluster
>
> 6. mds+osd @ beta   OK (24 sec)
>
> 7. mds+osd @ alpha  OK (30 sec)
>
> 8. mds+mon+osd @ beta  OK (27 sec)
>
> 9. power unplug @ alpha FAILED
>    stuck in {0=beta=up:replay} for a long time
>    finally it's switching to {0=alpha=up:active}, but FS not readable
>    FS permanently freezed, even when bringing up alpha...
>
> I included all the tests to show what worked and what didn't.
> note that the mds+osd worked most of the time but there was also a
> problematic test.
> also note that the power unplug test FAILED all the time, I included only
> one test.
>
>
> On Tue, Dec 20, 2011 at 10:50 PM, Gregory Farnum
> <gregory.farnum@dreamhost.com> wrote:
>> On Tue, Dec 20, 2011 at 10:07 AM, Karoly Horvath <rhswdev@gmail.com>
>> wrote:
>>> Hi,
>>> all test were made with kill -9, killing the active mds (and sometimes
>>> other processes).I waited a couple of minutes between each test to
>>> make sure that the cluster reached a stable state.(btw: how can I
>>> check this programmatically?)
>> You can run "ceph health", which has only a few different values you
>> can look for. :)
>>
>>> #  KILLED           result1. mds @ beta       OK2. mds @ alpha
>>> OK3. mds+osd @ beta   FAILED                    switch ok
>>> {0=alpha=up:active}, but FS not readable                    FS
>>> permanently freezed                    rebooted the whole cluster4.
>>> mds+mon @ alpha  OK (32 sec)                    rebooted the whole
>>> cluster5. mds+osd @ beta   OK (25 sec)                    rebooted the
>>> whole cluster6. mds+osd @ beta   OK (24 sec)7. mds+osd @ alpha  OK (30
>>> sec)8. mds+mon+osd @ beta  OK (27 sec)9. power unplug @ alpha FAILED
>>>                  stuck in {0=beta=up:replay} for a long time
>>>          finally it's switching to {0=alpha=up:active}, but FS not
>>> readable                    FS permanently freezed, even when bringing
>>> up alpha...
>> Your formatting got pretty mangled here, and I'm still not sure what's
>> going on. Did you restart all the daemons between each kill attempt?
>> (for instance, it looks like '1' is to kill mds.beta; '2' is to kill
>> mds.alpha, and then '3' is to kill mds.beta — but you already did
>> that)
>>
>>> I uploaded test results here:
>>> http://www.4shared.com/file/5nXMw_sM/cephlogs_mds_test.html?
>>> If you need any other configuration options changed, let me know
>> Sorry, I should have been clearer when I said turn on mds logging. Add
>> "debug mds = 20" and "debug ms = 1" lines to your ceph.conf MDS
>> sections. This will spit out a lot more information about what's going
>> on internally, which will help us diagnose this. :)
>
> I had those lines, the log seemed to be quite verbose... let me know if it
> didn't work.

It looks like maybe you got it turned on in the mon section rather
than the mds or global sections. :)


However, as I look at these a little more it generally looks good,
even in trial 3. The only alarming thing that's present is a note that
two of your clients failed to reconnect to the MDS in time and were
cut off. Did you try establishing a new connection to the cluster and
seeing if that worked? It's possible there's a client bug, or that
there was some sort of network error that interfered with them.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: how can I achieve HA with ceph?
       [not found]           ` <CA+o_KfuigaFGYvHVOk_W-gpTQryheYfc4=9v5nxemUyqhH-GQw@mail.gmail.com>
@ 2011-12-21 16:13             ` Gregory Farnum
       [not found]               ` <CA+o_KfvFnNqhiYuySZnhz7jdhj=MXgpiGqZA+D7Dw+fqZ7VNjA@mail.gmail.com>
       [not found]               ` <CA+o_Kft7RWEanbBM8ZUMTaz0ucr-XEOc1tJHK7i304XnspmRyA@mail.gmail.com>
  0 siblings, 2 replies; 13+ messages in thread
From: Gregory Farnum @ 2011-12-21 16:13 UTC (permalink / raw)
  To: Karoly Horvath; +Cc: ceph-devel

[Re-added list]

On Wed, Dec 21, 2011 at 4:33 AM, Karoly Horvath <rhswdev@gmail.com> wrote:
> On Wed, Dec 21, 2011 at 12:03 AM, Gregory Farnum
>> It looks like maybe you got it turned on in the mon section rather
>> than the mds or global sections. :)
>
> right
>
>> However, as I look at these a little more it generally looks good,
>> even in trial 3. The only alarming thing that's present is a note that
>> two of your clients failed to reconnect to the MDS in time and were
>> cut off. Did you try establishing a new connection to the cluster and
>> seeing if that worked? It's possible there's a client bug, or that
>> there was some sort of network error that interfered with them.
>> -Greg
>
> By client I assume you mean the kernel driver.. the FS is freezed, so
> I cannot unmount (cannot even `shutdown`).. how can I force the client
> to reconnect?

Try a lazy force unmount:
umount -lf ceph_mnt_point/
And then mount again.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: how can I achieve HA with ceph?
       [not found]               ` <CA+o_KfvFnNqhiYuySZnhz7jdhj=MXgpiGqZA+D7Dw+fqZ7VNjA@mail.gmail.com>
@ 2011-12-23  0:00                 ` Gregory Farnum
  2012-01-05 13:24                   ` Karoly Horvath
  0 siblings, 1 reply; 13+ messages in thread
From: Gregory Farnum @ 2011-12-23  0:00 UTC (permalink / raw)
  To: Karoly Horvath; +Cc: ceph-devel

On Wed, Dec 21, 2011 at 8:43 AM, Karoly Horvath <rhswdev@gmail.com> wrote:
> On Wed, Dec 21, 2011 at 4:13 PM, Gregory Farnum
>>> By client I assume you mean the kernel driver.. the FS is freezed, so
>>> I cannot unmount (cannot even `shutdown`).. how can I force the client
>>> to reconnect?
>>
>> Try a lazy force unmount:
>> umount -lf ceph_mnt_point/
>> And then mount again.
>
> wow, never heard about this, thanks.:)
> will report with the next mail
>
> In the meantime I did one test, killing mds+osd+mon on beta,
> it's jammed in '{0=alpha=up:replay}', after 45 minutes I shut it down...
> I attached the logs.

Oh, this is very odd! The MDS goes to sleep while it waits for an
up-to-date OSDMap, but it never seems to get woken up even though I
see the message sending in the OSDMap.

So let's try this one more time, but this time also add in "debug
objecter = 20" to the MDS config...Those logs will include everything
I need, or nothing will, promise! :)
-Greg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: how can I achieve HA with ceph?
  2011-12-23  0:00                 ` Gregory Farnum
@ 2012-01-05 13:24                   ` Karoly Horvath
  2012-01-05 19:06                     ` Gregory Farnum
  0 siblings, 1 reply; 13+ messages in thread
From: Karoly Horvath @ 2012-01-05 13:24 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

Hi,

back from holiday.

I did a successful power unplug test now, but the FS was unavailable
for 16 minutes which is clearly wrong...

I have the log files but the MDS log is 1.2 gigabyte, if you let me
know which lines to filter / filter out I will  upload it somewhere...

-- 
Karoly Horvath


On Fri, Dec 23, 2011 at 12:00 AM, Gregory Farnum
<gregory.farnum@dreamhost.com> wrote:
> On Wed, Dec 21, 2011 at 8:43 AM, Karoly Horvath <rhswdev@gmail.com> wrote:
>> On Wed, Dec 21, 2011 at 4:13 PM, Gregory Farnum
>>>> By client I assume you mean the kernel driver.. the FS is freezed, so
>>>> I cannot unmount (cannot even `shutdown`).. how can I force the client
>>>> to reconnect?
>>>
>>> Try a lazy force unmount:
>>> umount -lf ceph_mnt_point/
>>> And then mount again.
>>
>> wow, never heard about this, thanks.:)
>> will report with the next mail
>>
>> In the meantime I did one test, killing mds+osd+mon on beta,
>> it's jammed in '{0=alpha=up:replay}', after 45 minutes I shut it down...
>> I attached the logs.
>
> Oh, this is very odd! The MDS goes to sleep while it waits for an
> up-to-date OSDMap, but it never seems to get woken up even though I
> see the message sending in the OSDMap.
>
> So let's try this one more time, but this time also add in "debug
> objecter = 20" to the MDS config...Those logs will include everything
> I need, or nothing will, promise! :)
> -Greg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: how can I achieve HA with ceph?
  2012-01-05 13:24                   ` Karoly Horvath
@ 2012-01-05 19:06                     ` Gregory Farnum
       [not found]                       ` <CA+o_KftBMHMqk4zRKp3z-fe+KBSjRsoon4OCbc47bWdjQQTQ=w@mail.gmail.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Gregory Farnum @ 2012-01-05 19:06 UTC (permalink / raw)
  To: Karoly Horvath; +Cc: ceph-devel

On Thu, Jan 5, 2012 at 5:24 AM, Karoly Horvath <rhswdev@gmail.com> wrote:
> Hi,
>
> back from holiday.
>
> I did a successful power unplug test now, but the FS was unavailable
> for 16 minutes which is clearly wrong...
>
> I have the log files but the MDS log is 1.2 gigabyte, if you let me
> know which lines to filter / filter out I will  upload it somewhere...
>
> --
> Karoly Horvath

Assuming it's the same error as last time, the log will have a line
that contains "waiting for osdmap n (which blacklists prior
instance)", where n is an epoch number.

Then at some later point there will be a line that looks something
like the following:
"2011-12-21 13:45:17.594746 7f4885307700 -- xxx.xxx.xxx.31:6800/4438
<== mon.2 xxx.xxx.xxx.35:6789/0 9 ==== osd_map(y..z src has 1..495) v2
==== 748+0+0 (656995691 0 0) 0x1637400 con 0x163c000"
Where y and z are an interval which contains n. (In the previous log,
and probably here too, y=z=n.) I'm going to be interested in those two
lines and the stuff following when the osdmap arrives. Probably I will
only care about "objecter" lines, but it might be all of them...try
trimming off the minute following that osdmap line; it'll probably
contain more than I care about. :)
-Greg


> On Fri, Dec 23, 2011 at 12:00 AM, Gregory Farnum
> <gregory.farnum@dreamhost.com> wrote:
>> On Wed, Dec 21, 2011 at 8:43 AM, Karoly Horvath <rhswdev@gmail.com> wrote:
>>> On Wed, Dec 21, 2011 at 4:13 PM, Gregory Farnum
>>>>> By client I assume you mean the kernel driver.. the FS is freezed, so
>>>>> I cannot unmount (cannot even `shutdown`).. how can I force the client
>>>>> to reconnect?
>>>>
>>>> Try a lazy force unmount:
>>>> umount -lf ceph_mnt_point/
>>>> And then mount again.
>>>
>>> wow, never heard about this, thanks.:)
>>> will report with the next mail
>>>
>>> In the meantime I did one test, killing mds+osd+mon on beta,
>>> it's jammed in '{0=alpha=up:replay}', after 45 minutes I shut it down...
>>> I attached the logs.
>>
>> Oh, this is very odd! The MDS goes to sleep while it waits for an
>> up-to-date OSDMap, but it never seems to get woken up even though I
>> see the message sending in the OSDMap.
>>
>> So let's try this one more time, but this time also add in "debug
>> objecter = 20" to the MDS config...Those logs will include everything
>> I need, or nothing will, promise! :)
>> -Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: how can I achieve HA with ceph?
       [not found]                       ` <CA+o_KftBMHMqk4zRKp3z-fe+KBSjRsoon4OCbc47bWdjQQTQ=w@mail.gmail.com>
@ 2012-01-10  1:06                         ` Gregory Farnum
  0 siblings, 0 replies; 13+ messages in thread
From: Gregory Farnum @ 2012-01-10  1:06 UTC (permalink / raw)
  To: Karoly Horvath; +Cc: ceph-devel

On Fri, Jan 6, 2012 at 4:36 AM, Karoly Horvath <rhswdev@gmail.com> wrote:
> Hi,
>
> no, this is a different problem, this time the failover was successful.

Aha, a different problem indeed! I assume that mon.1 was located on
beta (the computer you killed)? It turns out that the standby MDS was
connected to the monitor you killed, and the MDS took a long time to
time out its connection so it didn't find out it needed to go active
until after the default connection timeout period had elapsed.

I've created a bug to track this issue:
http://tracker.newdream.net/issues/1912 (I will push a fix for it to
master tonight or tomorrow).
In the meantime you can work around it by only running one monitor and
not killing the node it's on; if you can attempt to reproduce the
previous issue that is a more interesting one! :)
-Greg

> 2012-01-05 12:49:48.185815   mds e376: 1/1/1 up {0=beta=up:active}, 1 up:standby
> 2012-01-05 12:50:32.200055   mds e377: 1/1/1 up
> {0=alpha=up:replay(laggy or crashed)}
> 2012-01-05 13:05:09.800119 7fd192bfa700 mds.0.55  waiting for osdmap
> 568 (which blacklists prior instance)
> 2012-01-05 13:06:07.851253 7fd192bfa700 mds.0.55 request_state up:active
> 2012-01-05 13:06:07.851259 7fd192bfa700 mds.0.55 beacon_send up:active
> seq 526 (currently up:rejoin)
>
> It took 15 minutes to get to the point where it prints "waiting for
> osdmap". After that it was quite fast. I hope someone will find the
> problem...
>
> --
> Karoly Horvath
> rhswdev@gmail.com
>
>
> On Thu, Jan 5, 2012 at 7:06 PM, Gregory Farnum
> <gregory.farnum@dreamhost.com> wrote:
>> On Thu, Jan 5, 2012 at 5:24 AM, Karoly Horvath <rhswdev@gmail.com> wrote:
>>> Hi,
>>>
>>> back from holiday.
>>>
>>> I did a successful power unplug test now, but the FS was unavailable
>>> for 16 minutes which is clearly wrong...
>>>
>>> I have the log files but the MDS log is 1.2 gigabyte, if you let me
>>> know which lines to filter / filter out I will  upload it somewhere...
>>>
>>> --
>>> Karoly Horvath
>>
>> Assuming it's the same error as last time, the log will have a line
>> that contains "waiting for osdmap n (which blacklists prior
>> instance)", where n is an epoch number.
>>
>> Then at some later point there will be a line that looks something
>> like the following:
>> "2011-12-21 13:45:17.594746 7f4885307700 -- xxx.xxx.xxx.31:6800/4438
>> <== mon.2 xxx.xxx.xxx.35:6789/0 9 ==== osd_map(y..z src has 1..495) v2
>> ==== 748+0+0 (656995691 0 0) 0x1637400 con 0x163c000"
>> Where y and z are an interval which contains n. (In the previous log,
>> and probably here too, y=z=n.) I'm going to be interested in those two
>> lines and the stuff following when the osdmap arrives. Probably I will
>> only care about "objecter" lines, but it might be all of them...try
>> trimming off the minute following that osdmap line; it'll probably
>> contain more than I care about. :)
>> -Greg
>>
>>
>>> On Fri, Dec 23, 2011 at 12:00 AM, Gregory Farnum
>>> <gregory.farnum@dreamhost.com> wrote:
>>>> On Wed, Dec 21, 2011 at 8:43 AM, Karoly Horvath <rhswdev@gmail.com> wrote:
>>>>> On Wed, Dec 21, 2011 at 4:13 PM, Gregory Farnum
>>>>>>> By client I assume you mean the kernel driver.. the FS is freezed, so
>>>>>>> I cannot unmount (cannot even `shutdown`).. how can I force the client
>>>>>>> to reconnect?
>>>>>>
>>>>>> Try a lazy force unmount:
>>>>>> umount -lf ceph_mnt_point/
>>>>>> And then mount again.
>>>>>
>>>>> wow, never heard about this, thanks.:)
>>>>> will report with the next mail
>>>>>
>>>>> In the meantime I did one test, killing mds+osd+mon on beta,
>>>>> it's jammed in '{0=alpha=up:replay}', after 45 minutes I shut it down...
>>>>> I attached the logs.
>>>>
>>>> Oh, this is very odd! The MDS goes to sleep while it waits for an
>>>> up-to-date OSDMap, but it never seems to get woken up even though I
>>>> see the message sending in the OSDMap.
>>>>
>>>> So let's try this one more time, but this time also add in "debug
>>>> objecter = 20" to the MDS config...Those logs will include everything
>>>> I need, or nothing will, promise! :)
>>>> -Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: how can I achieve HA with ceph?
       [not found]               ` <CA+o_Kft7RWEanbBM8ZUMTaz0ucr-XEOc1tJHK7i304XnspmRyA@mail.gmail.com>
@ 2012-01-18 23:43                 ` Gregory Farnum
  0 siblings, 0 replies; 13+ messages in thread
From: Gregory Farnum @ 2012-01-18 23:43 UTC (permalink / raw)
  To: Karoly Horvath; +Cc: ceph-devel

On Tue, Jan 17, 2012 at 5:08 AM, Karoly Horvath <rhswdev@gmail.com> wrote:
> Hi,
>
> I did this test on the new ceph 0.40-1oneiric package.
>
> I removed the power cord from beta.
>
> alpha became the new active MDS
> 2012-01-17 12:32:36.817012   mds e494: 1/1/1 up {0=alpha=up:active}
> but I still couldn't access the ceph fs, so I tried lazy force umount
> as you suggested:
> umount -lf ceph_mnt_point/
> but it never returned.
>
> I attached the logs.

To diagnose a problem like that we're going to need the client logs.
If nothing else there's probably some relevant output in dmesg? Also,
what kernel version?
Everything I checked looked good for the rest of the system; did you
check if new clients could do things appropriately?
-Greg

> (note: I removed a huge chunk of the alpha mds log covering approx 1 minute)
>
> --
> Karoly Horvath
>
>
> On Wed, Dec 21, 2011 at 4:13 PM, Gregory Farnum
> <gregory.farnum@dreamhost.com> wrote:
>> [Re-added list]
>>
>> On Wed, Dec 21, 2011 at 4:33 AM, Karoly Horvath <rhswdev@gmail.com> wrote:
>>> On Wed, Dec 21, 2011 at 12:03 AM, Gregory Farnum
>>>> It looks like maybe you got it turned on in the mon section rather
>>>> than the mds or global sections. :)
>>>
>>> right
>>>
>>>> However, as I look at these a little more it generally looks good,
>>>> even in trial 3. The only alarming thing that's present is a note that
>>>> two of your clients failed to reconnect to the MDS in time and were
>>>> cut off. Did you try establishing a new connection to the cluster and
>>>> seeing if that worked? It's possible there's a client bug, or that
>>>> there was some sort of network error that interfered with them.
>>>> -Greg
>>>
>>> By client I assume you mean the kernel driver.. the FS is freezed, so
>>> I cannot unmount (cannot even `shutdown`).. how can I force the client
>>> to reconnect?
>>
>> Try a lazy force unmount:
>> umount -lf ceph_mnt_point/
>> And then mount again.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-01-18 23:43 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-18 19:26 how can I achieve HA with ceph? Karoly Horvath
2011-12-19 17:40 ` Tommi Virtanen
2011-12-19 22:50 ` Gregory Farnum
2011-12-20 18:07   ` Karoly Horvath
2011-12-20 22:50     ` Gregory Farnum
2011-12-20 23:45       ` Karoly Horvath
     [not found]       ` <CA+o_KfsG=_TvQVNJ1HcUcTvntN5gqKCC3eXFhYwZVj3_fF4wRg@mail.gmail.com>
2011-12-21  0:03         ` Gregory Farnum
     [not found]           ` <CA+o_KfuigaFGYvHVOk_W-gpTQryheYfc4=9v5nxemUyqhH-GQw@mail.gmail.com>
2011-12-21 16:13             ` Gregory Farnum
     [not found]               ` <CA+o_KfvFnNqhiYuySZnhz7jdhj=MXgpiGqZA+D7Dw+fqZ7VNjA@mail.gmail.com>
2011-12-23  0:00                 ` Gregory Farnum
2012-01-05 13:24                   ` Karoly Horvath
2012-01-05 19:06                     ` Gregory Farnum
     [not found]                       ` <CA+o_KftBMHMqk4zRKp3z-fe+KBSjRsoon4OCbc47bWdjQQTQ=w@mail.gmail.com>
2012-01-10  1:06                         ` Gregory Farnum
     [not found]               ` <CA+o_Kft7RWEanbBM8ZUMTaz0ucr-XEOc1tJHK7i304XnspmRyA@mail.gmail.com>
2012-01-18 23:43                 ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.