All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gregory Farnum <greg@gregs42.com>
To: John Spray <john.spray@redhat.com>
Cc: Josh Durgin <jdurgin@redhat.com>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: RBD mirroring design draft
Date: Thu, 28 May 2015 07:07:16 -0700	[thread overview]
Message-ID: <CAC6JEv-3jZkGcLLOUNTnmCv=s_nxW=3opq=pT7z+zUoTW_ubjQ@mail.gmail.com> (raw)
In-Reply-To: <5566F0FF.2020400@redhat.com>

On Thu, May 28, 2015 at 3:42 AM, John Spray <john.spray@redhat.com> wrote:
>
>
> On 28/05/2015 06:37, Gregory Farnum wrote:
>>
>> On Tue, May 12, 2015 at 5:42 PM, Josh Durgin <jdurgin@redhat.com> wrote:
>>> Parallelism
>>> ^^^^^^^^^^^
>>>
>>> Mirroring many images is embarrassingly parallel. A simple unit of
>>> work is an image (more specifically a journal, if e.g. a group of
>>> images shared a journal as part of a consistency group in the future).
>>>
>>> Spreading this work across threads within a single process is
>>> relatively simple. For HA, and to avoid a single NIC becoming a
>>> bottleneck, we'll want to spread out the work across multiple
>>> processes (and probably multiple hosts). rbd-mirror should have no
>>> local state, so we just need a mechanism to coordinate the division of
>>> work across multiple processes.
>>>
>>> One way to do this would be layering on top of watch/notify. Each
>>> rbd-mirror process in a zone could watch the same object, and shard
>>> the set of images to mirror based on a hash of image ids onto the
>>> current set of rbd-mirror processes sorted by client gid. The set of
>>> rbd-mirror processes could be determined by listing watchers.
>>
>> You're going to have some tricky cases here when reassigning authority
>> as watchers come and go, but I think it should be doable.
>
>
> I've been fantasizing about something similar to this for CephFS backward
> scrub/recovery.  My current code supports parallelism, but relies on the
> user to script their population of workers across client nodes.
>
> I had been thinking of more of a master/slaves model, where one guy would
> get to be the master by e.g. taking the lock on an object, and he would then
> hand out work to everyone else that was a watch/notify subscriber to the
> magic object.  It seems like that could be simpler than having workers have
> to work out independently what their workload should be, and have the added
> bonus of providing a command-like mechanism in addition to continuous
> operation.

Heh. This could be the method but I caution people that it's a
brand-new use case for watch-notify and I'm not too sure how it'd
perform. I suspect we'd need to keep the chunks of work pretty large
in order to avoid the watch-notify cycle latencies being a limiting
factor. ;)

Speaking more generally, unless a peer-based model turns out to be
infeasible I much prefer that — the systems are sometimes more
complicated but generally much more resilient to failures, and tend to
be better-designed for recovery than when everything is residing in
the master's memory and then has to get reconstructed.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      reply	other threads:[~2015-05-28 14:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-13  0:42 RBD mirroring design draft Josh Durgin
2015-05-13  7:48 ` Haomai Wang
2015-05-13  8:07 ` Haomai Wang
2015-05-14  4:21   ` Josh Durgin
     [not found]     ` <CAAW3nmh+XxB8K2XsWgnD_cWWPZGw=VpsuomodMM1SNad8LmZAQ@mail.gmail.com>
2015-05-20 21:30       ` Josh Durgin
     [not found]         ` <CAAW3nmjWQTOOhym5t6LQ8E0P8AsHnD0c0MkfbF2zre_oUJFudw@mail.gmail.com>
2015-05-21 15:34           ` Josh Durgin
2015-05-28  5:37 ` Gregory Farnum
2015-05-28 10:42   ` John Spray
2015-05-28 14:07     ` Gregory Farnum [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAC6JEv-3jZkGcLLOUNTnmCv=s_nxW=3opq=pT7z+zUoTW_ubjQ@mail.gmail.com' \
    --to=greg@gregs42.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=jdurgin@redhat.com \
    --cc=john.spray@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.