All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Yan, Zheng" <ukernel@gmail.com>
To: Sage Weil <sage@inktank.com>
Cc: Alexandre Oliva <oliva@gnu.org>,
	sam.just@inktank.com, ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: [PATCH] reinstate ceph cluster_snap support
Date: Wed, 28 Aug 2013 08:54:31 +0800	[thread overview]
Message-ID: <CAAM7YA=9A37Paf3+NjM3hYD5Ro2tu0hkUwLrLt_aLtzBmky-aQ@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1308271515500.24783@cobra.newdream.net>

On Wed, Aug 28, 2013 at 6:21 AM, Sage Weil <sage@inktank.com> wrote:
> Hi,
>
> On Sat, 24 Aug 2013, Alexandre Oliva wrote:
>> On Aug 23, 2013, Sage Weil <sage@inktank.com> wrote:
>>
>> > FWIW Alexandre, this feature was never really complete.  For it to work,
>> > we also need to snapshot the monitors, and roll them back as well.
>>
>> That depends on what's expected from the feature, actually.
>>
>> One use is to roll back a single osd, and for that, the feature works
>> just fine.  Of course, for that one doesn't need the multi-osd snapshots
>> to be mutually consistent, but it's still convenient to be able to take
>> a global snapshot with a single command.
>>
>> Another use is to roll back the entire cluster to an earlier state, and
>> for that, you *probably* want to roll back the monitors too, although it
>> doesn't seem like this is strictly necessary, unless some significant
>> configuration changes occurrend in the cluster since the snapshot was
>> taken, and you want to roll back those too.
>>
>> In my experience, rolling back only osds has worked just fine, with the
>> exception of cases in which the snapshot is much too old, and the mons
>> have already expired osdmaps after the last one the osd got when the
>> snapshot was taken.  For this one case, I have a patch that enables the
>> osd to rejoin the cluster in spite of the expired osdmaps, which has
>> always worked for me, but I understand there may be exceptional cluster
>> reconfigurations in which this wouldn't have worked.
>>
>>
>> As for snapshotting monitors...  I suppose the way to go is to start a
>> store.db dump in background, instead of taking a btrfs snapshot, since
>> the store.db is not created as a subvolume.  That said, it would make
>> some sense to make it so, to make it trivially snapshottable.
>>
>>
>> Anyway, I found a problem in the earlier patch: when I added a new disk
>> to my cluster this morning, it tried to iterate over osdmaps that were
>> not available (e.g. the long-gone osdmap 1), and crashed.
>>
>> Here's a fixed version, that makes sure we don't start the iteration
>> before m->get_first().
>
> In principle, we can add this back in.  I think it needs a few changes,
> though.
>
> First, FileStore::snapshot() needs to pause and drain the workqueue before
> taking the snapshot, similar to what is done with the sync sequence.
> Otherwise it isn't a transactionally consistent snapshot and may tear some
> update.  Because it is draining the work queue, it *might* also need to
> drop some locks, but I'm hopeful that that isn't necessary.
>
> Second, the call in handle_osd_map() should probably go in the loop a bit
> further down that is consuming maps.  It probably won't matter most of the
> time, but I'm paranoid about corner conditions.  It also avoids iterating
> over the new OSDMaps multiple times in the common case where there is no
> cluster_snap (a minor win).
>
> Finally, eventually we should make this do a checkpoint on the mons too.
> We can add the osd snapping back in first, but before this can/should
> really be used the mons need to be snapshotted as well.  Probably that's
> just adding in a snapshot() method to MonitorStore.h and doing either a
> leveldb snap or making a full copy of store.db... I forget what leveldb is
> capable of here.
>

I think we also need to snapshot the osd journal

Regards
Yan, Zheng

  reply	other threads:[~2013-08-28  0:54 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-22  9:10 [PATCH] reinstate ceph cluster_snap support Alexandre Oliva
2013-08-24  0:17 ` Sage Weil
2013-08-24 14:56   ` Alexandre Oliva
2013-08-27 22:21     ` Sage Weil
2013-08-28  0:54       ` Yan, Zheng [this message]
2013-08-28  4:34         ` Sage Weil
2013-12-17 12:14       ` Alexandre Oliva
2013-12-17 13:50         ` Alexandre Oliva
2013-12-17 14:22           ` Alexandre Oliva
2013-12-18 19:35             ` Gregory Farnum
2013-12-19  8:22               ` Alexandre Oliva
2014-10-21  2:49       ` Alexandre Oliva
2014-10-27 21:00         ` Sage Weil
2014-11-03 19:57           ` Alexandre Oliva
2014-11-13 18:02             ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAM7YA=9A37Paf3+NjM3hYD5Ro2tu0hkUwLrLt_aLtzBmky-aQ@mail.gmail.com' \
    --to=ukernel@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=oliva@gnu.org \
    --cc=sage@inktank.com \
    --cc=sam.just@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.