From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:40190)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <omamluk@zerto.com>) id 1Rz70R-0008Th-0y
	for qemu-devel@nongnu.org; Sun, 19 Feb 2012 08:40:48 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <omamluk@zerto.com>) id 1Rz70P-0007ls-O3
	for qemu-devel@nongnu.org; Sun, 19 Feb 2012 08:40:46 -0500
Received: from mail-we0-f173.google.com ([74.125.82.173]:37191)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <omamluk@zerto.com>) id 1Rz70P-0007ln-JH
	for qemu-devel@nongnu.org; Sun, 19 Feb 2012 08:40:45 -0500
Received: by werh12 with SMTP id h12so1503111wer.4
	for <qemu-devel@nongnu.org>; Sun, 19 Feb 2012 05:40:44 -0800 (PST)
Message-ID: <4F40FBD6.2000500@zerto.com>
Date: Sun, 19 Feb 2012 15:40:38 +0200
From: Ori Mamluk <omamluk@zerto.com>
MIME-Version: 1.0
References: <73865e0ce364c40e0eb65ec6b22b819d@mail.gmail.com>
	<4F31153E.9010205@codemonkey.ws> <4F311839.9030709@redhat.com>
	<e2d5747f27ca519ddc6b7f0133e0a1a8@mail.gmail.com>
	<4F311BBA.8000708@codemonkey.ws> <4F312FD3.5020206@zerto.com>
	<4F3137DB.1040503@redhat.com> <4F3139CE.4040103@zerto.com>
	<4F314798.8010009@redhat.com> <4F3211D0.3070502@zerto.com>
	<4F323875.1000000@redhat.com> <4F3244C2.1040604@zerto.com>
	<4F32489A.80307@redhat.com> <4F32788C.60904@zerto.com>
	<CAJSP0QXRdU4XhW2wU+vOPh+vuzEECdKEWUGzaaENEtTbpRfKMg@mail.gmail.com>
In-Reply-To: <CAJSP0QXRdU4XhW2wU+vOPh+vuzEECdKEWUGzaaENEtTbpRfKMg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC] Replication agent design (was [RFC PATCH]
 replication agent module)
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>, =?UTF-8?B?16rXldee16gg15HXnyDXkA==?= =?UTF-8?B?15XXqA==?= <tomer@zertodata.com>, =?UTF-8?B?16LXldeT15M=?= =?UTF-8?B?INen15PXnQ==?= <oded@zerto.com>, dlaor@redhat.com, qemu-devel@nongnu.org, Yair Kuszpet <yairk@zerto.com>, Paolo Bonzini <pbonzini@redhat.com>

On 08/02/2012 16:59, Stefan Hajnoczi wrote:
> On Wed, Feb 8, 2012 at 1:28 PM, Ori Mamluk<omamluk@zerto.com>  wrote:
> You mentioned a future feature that sends request metadata (offset,
> length) to the rephub synchronously so that protection is 100%.
> (Otherwise a network failure or crash might result in missed writes
> that the rephub does not know about.)
>
> The NBD tap might not be the right channel for sending synchronous
> request metadata, since the protocol is geared towards block I/O
> requests that include the actual data.  I'm not sure that QMP should
> be used either - even though we have the concept of QMP events -
> because it's not a low-latency, high ops communications channel.
>
> Which channel do you use in your existing products for synchronous
> request metadata?
>
> Stefan

Looking a little deeper into the NBD solution, it has another 
problematic angle.
Assuming Rhev is managing the system - it will need to allocate a port 
per volume on the host.
I don't see a clean way to do it.
Also, the idea of opening 3 process-external APIs for the replication 
(NBD client, NBD server, meta-data tap) doesn't feel right to me.

Going back to Anthony's older mail :
> We're doomed to reinvent all of the Linux storage layer it seems.  I 
> think we really only have two choices: make better use of kernel 
> facilities for this (like drbd) or have a proper, pluggable, storage 
> interface so that QEMU proper doesn't have to deal with all of this.
>
> Gluster is appealing as a pluggable storage interface although the 
> license is problematic for us today.
>
> I'm quite confident that we shouldn't be in the business of 
> replicating storage though.  If the answer is NBD++, that's fine too. 

I think it might be better to go back to my original less generic design.
We can regard it as a 'plugin' for a specific application - in this 
case, replication.
I can add a plugin interface in the generic block layer that allows 
building a proper storage stack.
The plugin will have capabilities like a filter driver - getting hold of 
the request on its way down (from VM to storage) and on its way up (IO 
completion), allowing to block or stall both.

As for the plugin mechanism - it's clear to me that a dynamic plugin is 
out of the question. It can be a definition - for example a 'plugins' 
directory under block, which will contain the plugins code, and plugged 
by command line or QMP commands.
This way we create separation between the Qemu code and the storage filters,

The down side is that the plugin code tends to be less generic and 
reusable.
The advantage is that by separating - we don't complicate the Qemu 
storage stack code with applicative requirements.

How about it?

Ori.