From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ugis Subject: Proposal on monitor backup and ceph DR in general Date: Wed, 11 Mar 2020 21:23:16 +0200 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: Received: from mail-lf1-f45.google.com ([209.85.167.45]:35097 "EHLO mail-lf1-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730858AbgCKTXa (ORCPT ); Wed, 11 Mar 2020 15:23:30 -0400 Received: by mail-lf1-f45.google.com with SMTP id v8so1786498lfe.2 for ; Wed, 11 Mar 2020 12:23:28 -0700 (PDT) Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Ceph Development , sage@redhat.com, Wido den Hollander , clewis@centraldesktop.com Hi, Returning to ceph monitor backup topic after some time. This is rather to spark discussion as there is still place for improvement in ceph disaster recovery area. I have reviewed info regarding how to plan Disaster Recovery for ceph cluster focusing on making sure monitor data is safe. For example the official "Troubleshooting monitors" https://docs.ceph.com/docs/nautilus/rados/troubleshooting/troubleshooting-mon/ or older Wido's blog here https://blog.widodh.nl/2014/03/safely-backing-up-your-ceph-monitors/ and several threads in mailinglist. Conclusion is you don't backup ceph monitors just increase their count to reduce likehood of them all failing at once. Still things are different in case OSDs are encrypted with dmcrypt. Lets say power outage has happened and all monitor data is lost. Then even "Recovery using OSDs" would not help as LUKS keys live only in monitors - right? So for proposal part - to have complete and easy to use ceph DR something of following would be needed: vital info(monitor DBs, ceph config, mgr, mds data etc.) stored in redundant and convenient to restore form. - OSDs or monitors themselves should be the place to store that redundant critical info. In case of OSDs - not all of them, just some specified count enough to be sure at least one will survive in any case. OSDs being better candidates over monitors as those have higher probability to be distributed among racks, PDUs. - on those selected OSDs special robust storage would could be implemented(not specialist here but copy-on-write fs or circular buffer seem suitable). Could be raw, dedicated partition to be sure there is no filesystem below to be broken. Size of 1GB should be enough(?) - when enabling this "ceph recovery feature" user would have to specify password and all data would be encrypted using it. May be just use LUKS container that already handles passwords and put CoW/ring buffer inside of it. - monitors and other critical daemons would have to stream/replicate their DB changes to these dedicated safe-storages. Also I don't know how exactly currently mons communicate and do it fast enough but maybe etcd would help here https://etcd.io/ - eventually all critical changes would be streamed to those robust, distributed containers protected by password. - in case disaster strikes there would be special tool that could recover all needed info. I envision tool that you run on reinstalled mon/mgr/mds node, supply that storage LUKS blob+password as input and specify what to restore(mon/mds/mgr/all). It would restore requested daemon data up to part-second difference after crash and user can easily bring back mon quorum and the rest of ceph vital parts. It seams to me that either mons/mgrs need to advance and to include "ceph recovery module" that would do all of the above or special daemon is needed for that. Such daemons should be deployable also in remote locations that are reachable with fast enough link. Sure the above may seem over complicating things but discussion is needed as there is no Disaster Recovery chapter for ceph itself in documentation in one place. There are scattered parts how to recover monitors, how to mirror RBDs etc, but no single place with procedure "implement these steps and you will be able to easily recover your ceph cluster". BR, Ugis