* New pacific mon won't join with octopus mons
@ 2021-08-30 23:49 Chris Dunlop
2021-09-01 17:56 ` Gregory Farnum
0 siblings, 1 reply; 5+ messages in thread
From: Chris Dunlop @ 2021-08-30 23:49 UTC (permalink / raw)
To: ceph-devel
Hi,
Apologies if this isn't the correct mailing list for this, I've
already tried getting help for this over on ceph-users but haven't
received any response:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YHTXK22C7CMBWCWKUCSL4U32GLQGBSJG/
I'm not sure, but I suspect this is a bug...
I'm stuck, mid upgrade from octopus to pacific using cephadm, at the
point of upgrading the mons.
As background, this cluster started life 10 years ago with whatever was
current at the time, and has been progressively upgraded with each new
release. It's never had a cephfs on it and I ran into:
https://tracker.ceph.com/issues/51673
I created the temporary cephfs per the ticket and then ran into this
next problem...
I have 3 mons still on octopus and in quorum. When I try to bring up a
new pacific mon it stays permanently (over 10 minutes) in "probing"
state with the pacific ceph-mon chewing up >300% CPU on it's host, and
causing the octopus ceph-mons on the other hosts to chew up >100% CPU.
I've left it for 10 minutes in that state and it hasn't progressed
past "probing".
First up, I'm assuming a pacific mon is supposed to be able to join a
quorum with octopus mons, otherwise how is an upgrade supposed to
work?
The pacific mon, "b5", is v16.2.5 running under podman:
docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb
The octopus mons are all v15.2.14. The lead octopus mon ("k2") is
running under podman:
quay.io/ceph/ceph:v15
The other 2 octopus mons ("b2" and "b4") are standalone
15.2.14-1~bpo10+1. These are manually started due to the cephadm
upgrade failing at the point of upgrading the mons and leaving me with
only one cephadm mon running.
On start up of the pacific mon with debug_mon=20, over a 30 second
period I see 765 "handle_probe_reply" sequences by the pacific mon,
with the lead mon replying 200 times and the other two mons replying
279 times each.
In the very first "handle_probe_reply" sequence we see an expected (I
assume) "bootstrap" as the starting up mon doesn't yet have a monmap
("got newer/committed monmap epoch 35, mine was 0"):
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e0 handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 365132587 lc 365133304 ) mon_release octopus) v7
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e0 handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 365132587 lc 365133304 ) mon_release octopus) v7
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e0 monmap is e0: 3 mons at {noname-a=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],noname-b=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],noname-c=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e0 got newer/committed monmap epoch 35, mine was 0
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 bootstrap
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 sync_reset_requester
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 unregister_cluster_logger - not registered
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout 0x5646293a0a20
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 monmap e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 _reset
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing).auth v0 _set_mon_num_rank num 0 rank 0
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 timecheck_finish
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 15 mon.b5@-1(probing) e35 health_tick_stop
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 15 mon.b5@-1(probing) e35 health_interval_stop
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 scrub_event_cancel
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 scrub_reset
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 reset_probe_timeout 0x5646293a0a20 after 10 seconds
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 probing other monitors
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 _ms_dispatch new session 0x5646293aafc0 MonSession(mon.1 [v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0] is open , features 0x3f01cfb8ffedffff (luminous)) features 0x3f01cfb8ffedffff
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 5 mon.b5@-1(probing) e35 _ms_dispatch setting monitor caps on this connection
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 mon.b5@-1(probing) e35 entity_name global_id 0 (none) caps allow *
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 is_capable service=mon command= read addr v2:10.200.63.130:3300/0 on cap allow *
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 allow so far , doing grant allow *
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 allow all
However every single "handle_probe_reply" sequence after that also
does a "bootstrap", even though the mon at this point has a monmap
which the same epoch ("got newer/committed monmap epoch 35, mine was
35"), e.g.:
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 365132587 lc 365133304 ) mon_release octopus) v7
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 365132587 lc 365133304 ) mon_release octopus) v7
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 monmap is e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 got newer/committed monmap epoch 35, mine was 35
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 bootstrap
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 sync_reset_requester
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 unregister_cluster_logger - not registered
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout 0x5646293a0a20
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 monmap e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 _reset
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing).auth v0 _set_mon_num_rank num 0 rank 0
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 timecheck_finish
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 15 mon.b5@-1(probing) e35 health_tick_stop
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 15 mon.b5@-1(probing) e35 health_interval_stop
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 scrub_event_cancel
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 scrub_reset
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 reset_probe_timeout 0x5646293a0a20 after 10 seconds
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 probing other monitors
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 mon.b5@-1(probing) e35 _ms_dispatch existing session 0x5646293aafc0 for mon.1
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 mon.b5@-1(probing) e35 entity_name global_id 0 (none) caps allow *
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 is_capable service=mon command= read addr v2:10.200.63.130:3300/0 on cap allow *
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 allow so far , doing grant allow *
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 allow all
That same sequence keeps repeating, just with the "name" changing
between the octopus mons, i.e. "name b4", "name b2" and "name k2".
Referencing Monitor::handle_probe_reply() in src/mon/Monitor.cc (see
also below), if we ever get past the bootstrap() we should get a
message mentioning "peer", but we never see one.
That implies "mybl.contents_equal(m->monmap_bl)" is never true, and
"has_ever_joined" is never true.
What's going on here?
Cheers,
Chris
----------------------------------------------------------------------
void Monitor::handle_probe_reply(MonOpRequestRef op)
{
...
// newer map, or they've joined a quorum and we haven't?
bufferlist mybl;
monmap->encode(mybl, m->get_connection()->get_features());
// make sure it's actually different; the checks below err toward
// taking the other guy's map, which could cause us to loop.
if (!mybl.contents_equal(m->monmap_bl)) {
MonMap *newmap = new MonMap;
newmap->decode(m->monmap_bl);
if (m->has_ever_joined && (newmap->get_epoch() > monmap->get_epoch() ||
!has_ever_joined)) {
dout(10) << " got newer/committed monmap epoch " << newmap->get_epoch()
<< ", mine was " << monmap->get_epoch() << dendl;
delete newmap;
monmap->decode(m->monmap_bl);
notify_new_monmap(false);
bootstrap();
return;
}
delete newmap;
}
// rename peer?
string peer_name = monmap->get_name(m->get_source_addr());
if (monmap->get_epoch() == 0 && peer_name.compare(0, 7, "noname-") == 0) {
dout(10) << " renaming peer " << m->get_source_addr() << " "
<< peer_name << " -> " << m->name << " in my monmap"
<< dendl;
monmap->rename(peer_name, m->name);
if (is_electing()) {
bootstrap();
return;
}
} else if (peer_name.size()) {
dout(10) << " peer name is " << peer_name << dendl;
} else {
dout(10) << " peer " << m->get_source_addr() << " not in map" << dendl;
}
...
}
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: New pacific mon won't join with octopus mons
2021-08-30 23:49 New pacific mon won't join with octopus mons Chris Dunlop
@ 2021-09-01 17:56 ` Gregory Farnum
2021-09-02 1:41 ` Chris Dunlop
0 siblings, 1 reply; 5+ messages in thread
From: Gregory Farnum @ 2021-09-01 17:56 UTC (permalink / raw)
To: Chris Dunlop; +Cc: ceph-devel
Why are you trying to create a new pacific monitor instead of
upgrading an existing one?
I *think* what's going on here is that since you're deploying a new
pacific mon, and you're not giving it a starting monmap, it's set up
to assume the use of pacific features. It can find peers at the
locations you've given it, but since they're on octopus there are
mismatches.
Now, I would expect and want this to work so you should file a bug,
but the initial bootstrapping code is a bit hairy and may not account
for cross-version initial setup in this fashion, or have gotten buggy
since written. So I'd try upgrading the existing mons, or generating a
new pacific mon and upgrading that one to octopus if you're feeling
leery.
-Greg
On Mon, Aug 30, 2021 at 4:55 PM Chris Dunlop <chris@onthe.net.au> wrote:
>
> Hi,
>
> Apologies if this isn't the correct mailing list for this, I've
> already tried getting help for this over on ceph-users but haven't
> received any response:
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YHTXK22C7CMBWCWKUCSL4U32GLQGBSJG/
>
> I'm not sure, but I suspect this is a bug...
>
> I'm stuck, mid upgrade from octopus to pacific using cephadm, at the
> point of upgrading the mons.
>
> As background, this cluster started life 10 years ago with whatever was
> current at the time, and has been progressively upgraded with each new
> release. It's never had a cephfs on it and I ran into:
>
> https://tracker.ceph.com/issues/51673
>
> I created the temporary cephfs per the ticket and then ran into this
> next problem...
>
> I have 3 mons still on octopus and in quorum. When I try to bring up a
> new pacific mon it stays permanently (over 10 minutes) in "probing"
> state with the pacific ceph-mon chewing up >300% CPU on it's host, and
> causing the octopus ceph-mons on the other hosts to chew up >100% CPU.
> I've left it for 10 minutes in that state and it hasn't progressed
> past "probing".
>
> First up, I'm assuming a pacific mon is supposed to be able to join a
> quorum with octopus mons, otherwise how is an upgrade supposed to
> work?
>
> The pacific mon, "b5", is v16.2.5 running under podman:
>
> docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb
>
> The octopus mons are all v15.2.14. The lead octopus mon ("k2") is
> running under podman:
>
> quay.io/ceph/ceph:v15
>
> The other 2 octopus mons ("b2" and "b4") are standalone
> 15.2.14-1~bpo10+1. These are manually started due to the cephadm
> upgrade failing at the point of upgrading the mons and leaving me with
> only one cephadm mon running.
>
> On start up of the pacific mon with debug_mon=20, over a 30 second
> period I see 765 "handle_probe_reply" sequences by the pacific mon,
> with the lead mon replying 200 times and the other two mons replying
> 279 times each.
>
> In the very first "handle_probe_reply" sequence we see an expected (I
> assume) "bootstrap" as the starting up mon doesn't yet have a monmap
> ("got newer/committed monmap epoch 35, mine was 0"):
>
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e0 handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 365132587 lc 365133304 ) mon_release octopus) v7
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e0 handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 365132587 lc 365133304 ) mon_release octopus) v7
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e0 monmap is e0: 3 mons at {noname-a=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],noname-b=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],noname-c=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e0 got newer/committed monmap epoch 35, mine was 0
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 bootstrap
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 sync_reset_requester
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 unregister_cluster_logger - not registered
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout 0x5646293a0a20
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 monmap e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 _reset
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing).auth v0 _set_mon_num_rank num 0 rank 0
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 timecheck_finish
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 15 mon.b5@-1(probing) e35 health_tick_stop
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 15 mon.b5@-1(probing) e35 health_interval_stop
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 scrub_event_cancel
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 scrub_reset
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 reset_probe_timeout 0x5646293a0a20 after 10 seconds
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 probing other monitors
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 _ms_dispatch new session 0x5646293aafc0 MonSession(mon.1 [v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0] is open , features 0x3f01cfb8ffedffff (luminous)) features 0x3f01cfb8ffedffff
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 5 mon.b5@-1(probing) e35 _ms_dispatch setting monitor caps on this connection
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 mon.b5@-1(probing) e35 entity_name global_id 0 (none) caps allow *
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 is_capable service=mon command= read addr v2:10.200.63.130:3300/0 on cap allow *
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 allow so far , doing grant allow *
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 allow all
>
> However every single "handle_probe_reply" sequence after that also
> does a "bootstrap", even though the mon at this point has a monmap
> which the same epoch ("got newer/committed monmap epoch 35, mine was
> 35"), e.g.:
>
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 365132587 lc 365133304 ) mon_release octopus) v7
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 365132587 lc 365133304 ) mon_release octopus) v7
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 monmap is e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 got newer/committed monmap epoch 35, mine was 35
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 bootstrap
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 sync_reset_requester
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 unregister_cluster_logger - not registered
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout 0x5646293a0a20
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 monmap e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 _reset
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing).auth v0 _set_mon_num_rank num 0 rank 0
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 timecheck_finish
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 15 mon.b5@-1(probing) e35 health_tick_stop
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 15 mon.b5@-1(probing) e35 health_interval_stop
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 scrub_event_cancel
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 scrub_reset
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 reset_probe_timeout 0x5646293a0a20 after 10 seconds
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 probing other monitors
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 mon.b5@-1(probing) e35 _ms_dispatch existing session 0x5646293aafc0 for mon.1
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 mon.b5@-1(probing) e35 entity_name global_id 0 (none) caps allow *
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 is_capable service=mon command= read addr v2:10.200.63.130:3300/0 on cap allow *
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 allow so far , doing grant allow *
> Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 allow all
>
> That same sequence keeps repeating, just with the "name" changing
> between the octopus mons, i.e. "name b4", "name b2" and "name k2".
>
> Referencing Monitor::handle_probe_reply() in src/mon/Monitor.cc (see
> also below), if we ever get past the bootstrap() we should get a
> message mentioning "peer", but we never see one.
>
> That implies "mybl.contents_equal(m->monmap_bl)" is never true, and
> "has_ever_joined" is never true.
>
> What's going on here?
>
> Cheers,
>
> Chris
>
> ----------------------------------------------------------------------
> void Monitor::handle_probe_reply(MonOpRequestRef op)
> {
> ...
> // newer map, or they've joined a quorum and we haven't?
> bufferlist mybl;
> monmap->encode(mybl, m->get_connection()->get_features());
> // make sure it's actually different; the checks below err toward
> // taking the other guy's map, which could cause us to loop.
> if (!mybl.contents_equal(m->monmap_bl)) {
> MonMap *newmap = new MonMap;
> newmap->decode(m->monmap_bl);
> if (m->has_ever_joined && (newmap->get_epoch() > monmap->get_epoch() ||
> !has_ever_joined)) {
> dout(10) << " got newer/committed monmap epoch " << newmap->get_epoch()
> << ", mine was " << monmap->get_epoch() << dendl;
> delete newmap;
> monmap->decode(m->monmap_bl);
> notify_new_monmap(false);
>
> bootstrap();
> return;
> }
> delete newmap;
> }
>
> // rename peer?
> string peer_name = monmap->get_name(m->get_source_addr());
> if (monmap->get_epoch() == 0 && peer_name.compare(0, 7, "noname-") == 0) {
> dout(10) << " renaming peer " << m->get_source_addr() << " "
> << peer_name << " -> " << m->name << " in my monmap"
> << dendl;
> monmap->rename(peer_name, m->name);
>
> if (is_electing()) {
> bootstrap();
> return;
> }
> } else if (peer_name.size()) {
> dout(10) << " peer name is " << peer_name << dendl;
> } else {
> dout(10) << " peer " << m->get_source_addr() << " not in map" << dendl;
> }
> ...
> }
> ----------------------------------------------------------------------
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: New pacific mon won't join with octopus mons
2021-09-01 17:56 ` Gregory Farnum
@ 2021-09-02 1:41 ` Chris Dunlop
2021-09-02 18:19 ` Drew Freiberger
0 siblings, 1 reply; 5+ messages in thread
From: Chris Dunlop @ 2021-09-02 1:41 UTC (permalink / raw)
To: Gregory Farnum; +Cc: ceph-devel
Hi Gregory,
On Wed, Sep 01, 2021 at 10:56:56AM -0700, Gregory Farnum wrote:
> Why are you trying to create a new pacific monitor instead of
> upgrading an existing one?
The "ceph orch upgrade" failed twice at the point of upgrading the
mons, once due to the octopus mons getting the "--init" argument added
to their docker startup and the docker version on Debian Buster not
supporting both the "--init" and "-v /dev:/dev" args at the same time,
per:
https://github.com/moby/moby/pull/37665
...and once due to never having a cephfs on the cluster:
https://tracker.ceph.com/issues/51673
So at one point I had one mon down due to the failed upgrade, then
another of the 3 originals was taken out by the host's disk filling up
(due, I think, to the excessive logging occurring at the time in
combination with having both docker and podman images pulled in),
leaving me with a single octopus mon running and no quorum, bringing
the cluster to a stand still, and me panic-learning how to deal with
the situation. Fun times.
So yes, I was feeling just a little leery about upgrading the octopus
mons and potentialy losing quorum again!
> I *think* what's going on here is that since you're deploying a new
> pacific mon, and you're not giving it a starting monmap, it's set up
> to assume the use of pacific features. It can find peers at the
> locations you've given it, but since they're on octopus there are
> mismatches.
>
> Now, I would expect and want this to work so you should file a bug,
https://tracker.ceph.com/issues/52488
> but the initial bootstrapping code is a bit hairy and may not account
> for cross-version initial setup in this fashion, or have gotten buggy
> since written. So I'd try upgrading the existing mons, or generating a
> new pacific mon and upgrading that one to octopus if you're feeling
> leery.
Yes, I thought a safer / less stressful way of progressing would be to
add a new octopus mon to the existing quorum and upgrade that one
first as a test. I went ahead with that and checked the cluster health
immediately afterwards: "ceph -s" showed HEALTH_OK, with 4 mons, i.e.
3 x octopus and 1 x pacific.
Nice!
But shortly later alarms started going off and the health of the
cluster was coming back as more than a little gut-wrenching, with ALL
pgs showing up as inactive / unknown:
$ ceph -s
cluster:
id: c6618970-0ce0-4cb2-bc9a-dd5f29b62e24
health: HEALTH_WARN
Reduced data availability: 5721 pgs inactive
(muted: OSDMAP_FLAGS POOL_NO_REDUNDANCY)
services:
mon: 4 daemons, quorum k2,b2,b4,b5 (age 43m)
mgr: b5(active, starting, since 40m), standbys: b4, b2
osd: 78 osds: 78 up (since 4d), 78 in (since 3w)
flags noout
data:
pools: 12 pools, 5721 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
5721 unknown
$ ceph health detail
HEALTH_WARN Reduced data availability: 5721 pgs inactive; (muted: OSDMAP_FLAGS POOL_NO_REDUNDANCY)
(MUTED) [WRN] OSDMAP_FLAGS: noout flag(s) set
[WRN] PG_AVAILABILITY: Reduced data availability: 5721 pgs inactive
pg 6.fcd is stuck inactive for 41m, current state unknown, last acting []
pg 6.fce is stuck inactive for 41m, current state unknown, last acting []
pg 6.fcf is stuck inactive for 41m, current state unknown, last acting []
pg 6.fd0 is stuck inactive for 41m, current state unknown, last acting []
...etc.
So that was also heaps of fun for a while, until I thought to remove
the pacific mon and the health reverted to normal. Bug filed:
https://tracker.ceph.com/issues/52489
At this point I'm more than a little gun shy, but I'm girding my loins
to go ahead with the rest of the upgrade on the basis the health issue
is "just" a temporary reporting problem (albeit a highly startling
one!) with mixed octopus and pacific mons.
Cheers,
Chris
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: New pacific mon won't join with octopus mons
2021-09-02 1:41 ` Chris Dunlop
@ 2021-09-02 18:19 ` Drew Freiberger
2021-09-02 23:36 ` Chris Dunlop
0 siblings, 1 reply; 5+ messages in thread
From: Drew Freiberger @ 2021-09-02 18:19 UTC (permalink / raw)
To: Chris Dunlop, Gregory Farnum; +Cc: ceph-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 4962 bytes --]
Given your history of being on pre-octopus release on this environment
and having gone through other upgrades, have you ensured you've
completed all steps of the upgrade and set your minimum versions per the
documentation? I've seen where octopus will drop nautilus osds if the
osd_minimum_version wasn't set to nautilus before introduction of the
octopus mons. I wonder if you're running into something similar with
the o->p upgrade.
Check here for further details about completing upgrades (this started
in Luminous series):
https://ceph.io/en/news/blog/2017/new-luminous-upgrade-complete/
-Drew
On 9/1/21 8:41 PM, Chris Dunlop wrote:
> Hi Gregory,
>
> On Wed, Sep 01, 2021 at 10:56:56AM -0700, Gregory Farnum wrote:
>> Why are you trying to create a new pacific monitor instead of
>> upgrading an existing one?
>
> The "ceph orch upgrade" failed twice at the point of upgrading the
> mons, once due to the octopus mons getting the "--init" argument added
> to their docker startup and the docker version on Debian Buster not
> supporting both the "--init" and "-v /dev:/dev" args at the same time,
> per:
>
> https://github.com/moby/moby/pull/37665
>
> ...and once due to never having a cephfs on the cluster:
>
> https://tracker.ceph.com/issues/51673
>
> So at one point I had one mon down due to the failed upgrade, then
> another of the 3 originals was taken out by the host's disk filling up
> (due, I think, to the excessive logging occurring at the time in
> combination with having both docker and podman images pulled in),
> leaving me with a single octopus mon running and no quorum, bringing
> the cluster to a stand still, and me panic-learning how to deal with
> the situation. Fun times.
>
> So yes, I was feeling just a little leery about upgrading the octopus
> mons and potentialy losing quorum again!
>
>> I *think* what's going on here is that since you're deploying a new
>> pacific mon, and you're not giving it a starting monmap, it's set up
>> to assume the use of pacific features. It can find peers at the
>> locations you've given it, but since they're on octopus there are
>> mismatches.
>>
>> Now, I would expect and want this to work so you should file a bug,
>
> https://tracker.ceph.com/issues/52488
>
>> but the initial bootstrapping code is a bit hairy and may not account
>> for cross-version initial setup in this fashion, or have gotten buggy
>> since written. So I'd try upgrading the existing mons, or generating a
>> new pacific mon and upgrading that one to octopus if you're feeling
>> leery.
>
> Yes, I thought a safer / less stressful way of progressing would be to
> add a new octopus mon to the existing quorum and upgrade that one
> first as a test. I went ahead with that and checked the cluster health
> immediately afterwards: "ceph -s" showed HEALTH_OK, with 4 mons, i.e.
> 3 x octopus and 1 x pacific.
>
> Nice!
> But shortly later alarms started going off and the health of the
> cluster was coming back as more than a little gut-wrenching, with ALL
> pgs showing up as inactive / unknown:
>
> $ ceph -s
> cluster:
> id: c6618970-0ce0-4cb2-bc9a-dd5f29b62e24
> health: HEALTH_WARN
> Reduced data availability: 5721 pgs inactive
> (muted: OSDMAP_FLAGS POOL_NO_REDUNDANCY)
>
> services:
> mon: 4 daemons, quorum k2,b2,b4,b5 (age 43m)
> mgr: b5(active, starting, since 40m), standbys: b4, b2
> osd: 78 osds: 78 up (since 4d), 78 in (since 3w)
> flags noout
>
> data:
> pools: 12 pools, 5721 pgs
> objects: 0 objects, 0 B
> usage: 0 B used, 0 B / 0 B avail
> pgs: 100.000% pgs unknown
> 5721 unknown
>
> $ ceph health detail
> HEALTH_WARN Reduced data availability: 5721 pgs inactive; (muted:
> OSDMAP_FLAGS POOL_NO_REDUNDANCY)
> (MUTED) [WRN] OSDMAP_FLAGS: noout flag(s) set
> [WRN] PG_AVAILABILITY: Reduced data availability: 5721 pgs inactive
> pg 6.fcd is stuck inactive for 41m, current state unknown, last
> acting []
> pg 6.fce is stuck inactive for 41m, current state unknown, last
> acting []
> pg 6.fcf is stuck inactive for 41m, current state unknown, last
> acting []
> pg 6.fd0 is stuck inactive for 41m, current state unknown, last
> acting []
> ...etc.
>
> So that was also heaps of fun for a while, until I thought to remove
> the pacific mon and the health reverted to normal. Bug filed:
>
> https://tracker.ceph.com/issues/52489
>
> At this point I'm more than a little gun shy, but I'm girding my loins
> to go ahead with the rest of the upgrade on the basis the health issue
> is "just" a temporary reporting problem (albeit a highly startling
> one!) with mixed octopus and pacific mons.
>
> Cheers,
>
> Chris
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 1751 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-09-02 23:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-30 23:49 New pacific mon won't join with octopus mons Chris Dunlop
2021-09-01 17:56 ` Gregory Farnum
2021-09-02 1:41 ` Chris Dunlop
2021-09-02 18:19 ` Drew Freiberger
2021-09-02 23:36 ` Chris Dunlop
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).