All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ori Mamluk <omamluk@zerto.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: "Kevin Wolf" <kwolf@redhat.com>,
	"תומר בן אור" <tomer@zertodata.com>, "עודד קדם" <oded@zerto.com>,
	dlaor@redhat.com, qemu-devel@nongnu.org,
	"Luiz Capitulino" <lcapitulino@redhat.com>
Subject: Re: [Qemu-devel] [RFC PATCH] replication agent module
Date: Tue, 07 Feb 2012 16:18:06 +0200	[thread overview]
Message-ID: <4F31329E.50204@zerto.com> (raw)
In-Reply-To: <CAJSP0QWdEXLPJ2gNRtvEdJV38zijCONTf8MNk9tUvEzWum70-g@mail.gmail.com>

On 07/02/2012 15:50, Stefan Hajnoczi wrote:

First let me say that I'm not completely used to the inline replies - so 
I initially missed some of your mail before.
> On Tue, Feb 7, 2012 at 1:34 PM, Kevin Wolf<kwolf@redhat.com>  wrote:
>> Am 07.02.2012 11:29, schrieb Ori Mamluk:
>>> Repagent is a new module that allows an external replication system to
>>> replicate a volume of a Qemu VM.
> I recently joked with Kevin that QEMU is on its way to reimplementing
> the Linux block and device-mapper layers.  Now we have drbd, thanks!
> :P
>
> Except for image files, the way to do this on a Linux host would be
> using drbd block devices.  We still haven't figured out a nice way to
> make image files full-fledged Linux block devices, so we're
> reimplementing all the block code in QEMU userspace.
>
>>> This RFC patch adds the repagent client module to Qemu.
>>>
>>>
>>>
>>> Documentation of the module role and API is in the patch at
>>> replication/qemu-repagent.txt
>>>
>>>
>>>
>>> The main motivation behind the module is to allow replication of VMs in
>>> a virtualization environment like RhevM.
>>>
>>> To achieve this we need basic replication support in Qemu.
>>>
>>>
>>>
>>> This is the first submission of this module, which was written as a
>>> Proof Of Concept, and used successfully for replicating and recovering a
>>> Qemu VM.
>> I'll mostly ignore the code for now and just comment on the design.
>>
>> One thing to consider for the next version of the RFC would be to split
>> this in a series smaller patches. This one has become quite large, which
>> makes it hard to review (and yes, please use git send-email).
>>
>>> Points and open issues:
>>>
>>> *             The module interfaces the Qemu storage stack at block.c
>>> generic layer. Is this the right place to intercept/inject IOs?
>> There are two ways to intercept I/O requests. The first one is what you
>> chose, just add some code to bdrv_co_do_writev, and I think it's
>> reasonable to do this.
>>
>> The other one would be to add a special block driver for a replication:
>> protocol that writes to two different places (the real block driver for
>> the image, and the network connection). Generally this feels even a bit
>> more elegant, but it brings new problems with it: For example, when you
>> create an external snapshot, you need to pay attention not to lose the
>> replication because the protocol is somewhere in the middle of a backing
>> file chain.
>>
>>> *             The patch contains performing IO reads invoked by a new
>>> thread (a TCP listener thread). See repaget_read_vol in repagent.c. It
>>> is not protected by any lock – is this OK?
>> No, definitely not. Block layer code expects that it holds
>> qemu_global_mutex.
>>
>> I'm not sure if a thread is the right solution. You should probably use
>> something that resembles other asynchronous code in qemu, i.e. either
>> callback or coroutine based.
> There is a flow control problem here which is interesting.  If the
> rephub is slower than the writer or unavailable, then eventually we
> either need to stop replicating writes or we need to throttle the
> guest writes.  I haven't read through the whole patch yet but the flow
> control solution is very closely tied to how you use
> threads/coroutines and how you use network sockets.
In general the replication is naturally less important than the main 
(production) volume.
This means that the solution aims to never throttle the guest writes.
In the current stage, both IOs will need to complete before reporting 
back to the guest, but the volume is a real write to storage while the 
Rephub may involve only copying to memory.
Later on we can get rid of waiting to the replicated IO altogether by 
adding a bitmap - but this is only for a later stage.

>
>>> +             * Read a protected volume - allows the Rephub to read a
>>> protected volume, to enable the protected hub to syncronize the content
>>> of a protected volume.
>> We were discussing using NBD as the protocol for any data that is
>> transferred from/to the replication hub, so that we can use the existing
>> NBD client and server code that qemu has. Seems you came to the
>> conclusion to use different protocol? What are the reasons?
>>
>> The other message types could possibly be implemented as QMP commands. I
>> guess we might need to attach multiple QMP monitors for this to work
>> (one for libvirt, one for the rephub). I'm not sure if there is a
>> fundamental problem with this or if it just needs to be done.
> Agreed.  You can already query block devices using QMP 'query-block'.
> By adding in-process NBD server support you could then launch an NBD
> server for each volume which you wish to replicate.  However, in this
> case it sounds almost like you want the reverse - you could provide an
> NBD server on the rephub and QEMU would mirror writes to it (the NBD
> client code is already in QEMU).
>
> There is also interest from other external software (like libvirt) to
> be able to read volumes while the VM is running.
>
> BTW, do you poll the volumes or how do you handle hotplug?  Does
> anything special need to be done when a volume is unplugged?
We assume that we handle he hotplug top-down - via the management 
system, and not from the VM.
In general, we don't protect 'all volumes' of a VM - the management 
system (either RhevM or Rephub - depending on the design) specifically 
instructs to start protecting a volume.
> Stefan

  parent reply	other threads:[~2012-02-07 14:18 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-07 10:29 [Qemu-devel] [RFC PATCH] replication agent module Ori Mamluk
2012-02-07 12:12 ` Anthony Liguori
2012-02-07 12:25   ` Dor Laor
2012-02-07 12:30     ` Ori Mamluk
2012-02-07 12:40       ` Anthony Liguori
2012-02-07 14:06         ` Ori Mamluk
2012-02-07 14:40           ` Paolo Bonzini
2012-02-07 14:48             ` Ori Mamluk
2012-02-07 15:47               ` Paolo Bonzini
2012-02-08  6:10                 ` Ori Mamluk
2012-02-08  8:49                   ` Dor Laor
2012-02-08 11:59                     ` Stefan Hajnoczi
2012-02-08  8:55                   ` Kevin Wolf
2012-02-08  9:47                     ` Ori Mamluk
2012-02-08 10:04                       ` Kevin Wolf
2012-02-08 13:28                         ` [Qemu-devel] [RFC] Replication agent design (was [RFC PATCH] replication agent module) Ori Mamluk
2012-02-08 14:59                           ` Stefan Hajnoczi
2012-02-08 14:59                             ` Stefan Hajnoczi
2012-02-19 13:40                             ` Ori Mamluk
2012-02-20 14:32                               ` Paolo Bonzini
2012-02-21  9:03                                 ` [Qemu-devel] BlockDriverState stack and BlockListeners (was: [RFC] Replication agent design) Kevin Wolf
2012-02-21  9:15                                   ` [Qemu-devel] BlockDriverState stack and BlockListeners Paolo Bonzini
2012-02-21  9:49                                     ` Kevin Wolf
2012-02-21 10:09                                       ` Paolo Bonzini
2012-02-21 10:51                                         ` Kevin Wolf
2012-02-21 11:36                                           ` Paolo Bonzini
2012-02-21 12:22                                             ` Stefan Hajnoczi
2012-02-21 12:57                                               ` Paolo Bonzini
2012-02-21 15:49                                               ` Markus Armbruster
2012-02-21 13:10                                             ` Kevin Wolf
2012-02-21 13:21                                               ` Paolo Bonzini
2012-02-21 15:56                                               ` Markus Armbruster
2012-02-21 16:04                                                 ` Kevin Wolf
2012-02-21 16:19                                                   ` Markus Armbruster
2012-02-21 16:39                                                     ` Kevin Wolf
2012-02-21 17:16                                               ` Stefan Hajnoczi
2012-02-21 10:20                                       ` Ori Mamluk
2012-02-29  8:38                                   ` Ori Mamluk
2012-03-03 11:46                                     ` Stefan Hajnoczi
2012-03-04  5:14                                       ` Ori Mamluk
2012-03-04  8:56                                         ` Paolo Bonzini
2012-03-05 12:04                                         ` Stefan Hajnoczi
2012-02-08 11:02                   ` [Qemu-devel] [RFC PATCH] replication agent module Stefan Hajnoczi
2012-02-08 13:00                     ` [Qemu-devel] [RFC] Replication agent requirements (was [RFC PATCH] replication agent module) Ori Mamluk
2012-02-08 13:30                       ` Anthony Liguori
2012-02-08 12:03                   ` [Qemu-devel] [RFC PATCH] replication agent module Stefan Hajnoczi
2012-02-08 12:46                     ` Paolo Bonzini
2012-02-08 14:39                       ` Stefan Hajnoczi
2012-02-08 14:55                         ` Paolo Bonzini
2012-02-08 15:07                           ` Stefan Hajnoczi
2012-02-07 14:53             ` Kevin Wolf
2012-02-07 15:00             ` Anthony Liguori
2012-02-07 13:34 ` Kevin Wolf
2012-02-07 13:50   ` Stefan Hajnoczi
2012-02-07 13:58     ` Paolo Bonzini
2012-02-07 14:05     ` Paolo Bonzini
2012-02-08 12:17       ` Orit Wasserman
2012-02-07 14:18     ` Ori Mamluk [this message]
2012-02-07 14:59     ` Anthony Liguori
2012-02-07 15:20       ` Stefan Hajnoczi
2012-02-07 16:25         ` Anthony Liguori
2012-02-21 16:01       ` Markus Armbruster
2012-02-21 17:31         ` Stefan Hajnoczi
2012-02-07 14:45   ` Ori Mamluk
2012-02-08 12:29     ` Orit Wasserman
2012-02-08 11:45   ` Luiz Capitulino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F31329E.50204@zerto.com \
    --to=omamluk@zerto.com \
    --cc=dlaor@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=oded@zerto.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=tomer@zertodata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.