All of lore.kernel.org
 help / color / mirror / Atom feed
* upgrade from v0.94.6 or lower and 'failed to encode map X with expected crc'
@ 2016-10-04  7:18 kefu chai
  0 siblings, 0 replies; only message in thread
From: kefu chai @ 2016-10-04  7:18 UTC (permalink / raw)
  To: ceph-users, ceph-devel

hi ceph users,

If user upgrades the cluster from a prior release to v0.94.7 or up by
following the steps:

1. upgrade the monitors first,
2. and then the OSDs.

It is expected that the cluster log will be flooded with messages like:

2016-07-12 08:42:42.1234567 osd.1234 [WRN] failed to encode map e4321
with expected crc

Because we changed[1] the encoding of OSDMap in v0.94.7. And the
monitors start sending the incremental OSDMaps with the new encoding
to the OSDs once the quorum members are all at the new version. But
the OSDs at the old version still re-encode the osdmaps with the old
encoding, then compare the resulting CRC with the one carried by the
received incremental maps. And, they don't match! So the OSDs will ask
the monitors for the full map in this case.

For a large Ceph cluster, there are several consequences of the CRC mismatch:
1. monitor being flooded by this clog
2. monitor burdened by the sending the fullmaps.
3. the network saturated by the osdmap messages carrying the requested fullmaps
3. slow requests observed if the updated osdmaps are delayed by the
saturated network.

as reported[2,3,4,5] by our users.

The interim solution for those who are stuck in the middle of an upgrade is:

1. revert all the monitors back to the previous version,
2. upgrade the OSDs to the version you want to upgrade.
3. upgrade the monitors to the version you want to upgrade.

And for users who plan to upgrade from a version prior to v0.94.7 to
v0.94.7 or up, please
1. upgrade the OSDs to the version you want to upgrade
2. upgrade the monitors to the version you want to upgrade.

For users preferring upgrading from a version prior to v0.94.7 to
jewel, it is suggested to upgrade to the latest hammer first by
following the steps above, if the scale of your cluster is relatively
large.

And in the short term, we are preparing a fix[6] for hammer, so the
monitors will send osdmap encoded with lower version encoding.

In the long term, we won't use the new release feature bit in the
cluster unless allowed explicitly[7].


@ceph developers,

so if we want to bump up the encoding version of OSDMap or its
(sub)fields, I think it would be desirable to match the encoder with
the new major release feature bit. For instance, if a new field named
"foo" is added to `pg_pool_t` in kraken, and `map<int64_t,pg_pool_t>
pools` is in turn a field of `OSDMap`, then we need to be careful when
updating `pg_pool_t::encode()`, like

void pg_pool_t::encode(bufferlist& bl, uint64_t features) const {
  // ...
  if ((features & CEPH_FEATURE_SERVER_KRAKEN) == 0) {
    // encode in the jewel way
    return;
  }
  // encode in the kraken way
}

Because,

- it would be difficult for the monitor to send understandable osdmaps
for all osds.
- we disable/enable the new encoder by excluding/including the major
release feature bit in [7].

--
[1] sha1 039240418060c9a49298dacc0478772334526dce
[2] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg30783.html
[3] http://www.spinics.net/lists/ceph-users/msg28296.html
[4] http://ceph-users.ceph.narkive.com/rPGrATpE/v0-94-7-hammer-released
[5] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013189.html
[6] http://tracker.ceph.com/issues/17386
[7] https://github.com/ceph/ceph/pull/11284

-- 
Regards
Kefu Chai

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2016-10-04  7:18 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-04  7:18 upgrade from v0.94.6 or lower and 'failed to encode map X with expected crc' kefu chai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.