From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:40190) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rz70R-0008Th-0y for qemu-devel@nongnu.org; Sun, 19 Feb 2012 08:40:48 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Rz70P-0007ls-O3 for qemu-devel@nongnu.org; Sun, 19 Feb 2012 08:40:46 -0500 Received: from mail-we0-f173.google.com ([74.125.82.173]:37191) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rz70P-0007ln-JH for qemu-devel@nongnu.org; Sun, 19 Feb 2012 08:40:45 -0500 Received: by werh12 with SMTP id h12so1503111wer.4 for ; Sun, 19 Feb 2012 05:40:44 -0800 (PST) Message-ID: <4F40FBD6.2000500@zerto.com> Date: Sun, 19 Feb 2012 15:40:38 +0200 From: Ori Mamluk MIME-Version: 1.0 References: <73865e0ce364c40e0eb65ec6b22b819d@mail.gmail.com> <4F31153E.9010205@codemonkey.ws> <4F311839.9030709@redhat.com> <4F311BBA.8000708@codemonkey.ws> <4F312FD3.5020206@zerto.com> <4F3137DB.1040503@redhat.com> <4F3139CE.4040103@zerto.com> <4F314798.8010009@redhat.com> <4F3211D0.3070502@zerto.com> <4F323875.1000000@redhat.com> <4F3244C2.1040604@zerto.com> <4F32489A.80307@redhat.com> <4F32788C.60904@zerto.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] Replication agent design (was [RFC PATCH] replication agent module) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Kevin Wolf , =?UTF-8?B?16rXldee16gg15HXnyDXkA==?= =?UTF-8?B?15XXqA==?= , =?UTF-8?B?16LXldeT15M=?= =?UTF-8?B?INen15PXnQ==?= , dlaor@redhat.com, qemu-devel@nongnu.org, Yair Kuszpet , Paolo Bonzini On 08/02/2012 16:59, Stefan Hajnoczi wrote: > On Wed, Feb 8, 2012 at 1:28 PM, Ori Mamluk wrote: > You mentioned a future feature that sends request metadata (offset, > length) to the rephub synchronously so that protection is 100%. > (Otherwise a network failure or crash might result in missed writes > that the rephub does not know about.) > > The NBD tap might not be the right channel for sending synchronous > request metadata, since the protocol is geared towards block I/O > requests that include the actual data. I'm not sure that QMP should > be used either - even though we have the concept of QMP events - > because it's not a low-latency, high ops communications channel. > > Which channel do you use in your existing products for synchronous > request metadata? > > Stefan Looking a little deeper into the NBD solution, it has another problematic angle. Assuming Rhev is managing the system - it will need to allocate a port per volume on the host. I don't see a clean way to do it. Also, the idea of opening 3 process-external APIs for the replication (NBD client, NBD server, meta-data tap) doesn't feel right to me. Going back to Anthony's older mail : > We're doomed to reinvent all of the Linux storage layer it seems. I > think we really only have two choices: make better use of kernel > facilities for this (like drbd) or have a proper, pluggable, storage > interface so that QEMU proper doesn't have to deal with all of this. > > Gluster is appealing as a pluggable storage interface although the > license is problematic for us today. > > I'm quite confident that we shouldn't be in the business of > replicating storage though. If the answer is NBD++, that's fine too. I think it might be better to go back to my original less generic design. We can regard it as a 'plugin' for a specific application - in this case, replication. I can add a plugin interface in the generic block layer that allows building a proper storage stack. The plugin will have capabilities like a filter driver - getting hold of the request on its way down (from VM to storage) and on its way up (IO completion), allowing to block or stall both. As for the plugin mechanism - it's clear to me that a dynamic plugin is out of the question. It can be a definition - for example a 'plugins' directory under block, which will contain the plugins code, and plugged by command line or QMP commands. This way we create separation between the Qemu code and the storage filters, The down side is that the plugin code tends to be less generic and reusable. The advantage is that by separating - we don't complicate the Qemu storage stack code with applicative requirements. How about it? Ori.